Why doesn't "./myprog | file" work like "cat file

Why doesn't "./myprog | file" work like "cat file | ./myprog"? - python

Hello I am new to Python and would like to know different execution modes of the python program, say for below my program runs fine in first mode but second mode gives me error.
cat inputfile | ./pythonprogram.py - works
./pythonprogram.py | inputfile -- doesn't work
Also what are the all the best practices in executing the program through parsing input files.
FYI.. i am on the google python babyname exercise and below is my program:-
PS: I haven't coded a decent complete code, but this is more like a draft before attempting to execute full program.

Did you perhaps mean:
./pythonprogram.py < inputfile
This takes the contents of inputfile and pipes it to your program.
On the other hand:
./pythonprogram.py | inputfile
Will take the output from your python program, try to execute inputfile (it can't), and then give it the output from the python program.

Related

Using subprocess to pipe and then write to a file

I am writing a python script using subprocess. I would like to get the following bash behaviour...
cat input_file.txt | awk '/pattern/' > output_file.txt
I wrote this code, but it does not work...
cat = subprocess.Popen(shlex.split("cat "+input_file_name),stdout=subprocess.PIPE)
output_file: = open('output_file.txt','wb')
awk_search = subprocess.run(shlex.split("awk '/pattern/'"),stdin=cat.stdout,stdout=output_file)
When I try this step by step in the python terminal things work until I attempt to output to a file.
Could you please help me? Thanks!

How to redirect input to python script

How come this command python test.py <(cat file1.txt) does not work accordingly. I could've sworn I had this working previously. Basically, I would like to get the output of that cat command as an input to the python script.
This command
cat file1.txt | python test.py
works okay, which outputs:
reading: file11
reading: file12
Which are based on the following scripts/files below.
The reason I want this to work is because I really want to feed in 2 input files like
python test.py <(cat file1.txt) <(cat file2.txt)
And would like some python output like:
reading: file11 file21
reading: file12 file22
I know this is a very simple example, and I can just read in or open() both files inside the python script and iterate accordingly. This is a simplified version of my current screnario, the cat command is technically another executable doing other things, so its not as easy as just reading/opening the file to read.
Sample script/files:
test.py:
import sys
for line in sys.stdin:
print("reading: ", line.strip())
sys.stdin.close()
file1.txt:
file11
file12
file2.txt:
file21
file22

changing test.py to:
import sys
input1 = open(sys.argv[1], "r")
input2 = open(sys.argv[2], "r")
for line1, line2 in zip(input1, input2):
print("reading: ", line1.strip(), line2.strip())
input1.close()
input2.close()
will enable python test.py <(cat file1.txt) <(cat file2.txt) to work

Actually it depends on shell you are using.
I guess you use bash which unfortunately can't have it working as only last redirection from specific descriptor is taken. You could create temporary file, redirect output of scripts to it and then feed your main script with tmp file.
Or if you don't mind you can switch e.g to zsh, which has such feature enabled by default.

Python: get name of file piped to stdin

I have a python program that reads input from stdin (required), and processes lines from stdin:
for lines in stdin:
#do stuff to lines
filename = #need file name
#example print
print(filename)
However, in this for loop, I also need to get the name of the file that has been piped in to this python program like this:
cat document.txt | pythonFile.py #should print document.txt with the example print
Is there a way to do this?

No, this is not possible. As the receiving end of a pipe you have no knowledge where your data stream is coming from. The use of cat further obfuscates it, but even if you would write ./pythonFile.py < document.txt you would have no clue.
Many unix tools accept filenames as argument and - as a special code for 'stdin'. You could design your script the same way, so it can be called like
cat document.txt | pythonFile.py - (your script doesn't know the input origin)
./pythonFile.py document.txt (your script does know the file)

Python/Unix os.system: Can anyone tell me why this piece of code using awk does not work when executed with os.system?

I have been working with this piece of code for some time in Unix systems and it works just fine while running on a normal command line. However, for the sake of a project and learning how to execute Unix commands through Python, I am trying to run it using the os.system() command in Python.
Overall data structure has 5 columns and 1500 rows and the goal is to replace values greater than 2.706 in column 4 ($4) and then proceed to save them into file2.txt while keeping all other values in the file the same.
os.system("awk '{print $1,$2,$3,$5,($4>=2.706)? -999 : $4}' file1.txt > file2.txt")
From this code, I receive the message Invalid syntax after attempting the execution from a python script.
As I am new to Python, I believe I must just missing something in the syntax from that side of the code, but I cannot for the life of me figure it out. Any help would be greatly appreciated.
A new attempt with the same code, but using the subprocess module instead of os.system:
arg1 = "awk '{print $1,$2,$3,$5,($4>=2.7059553)\? -999 \: $4}' phenotypes.txt > replacetest.txt"
subprocess.run(arg1, shell=TRUE)
This code also gives the Invalid syntax response to the creation of the arg1 command.
(Code is being run in Python 2.7.5 on Linux2.)

Problems with running a python script over many files

I am on a Linux (Ubuntu 11.10) machine; bourne again shell.
I have to process a directory full of files with a python script. My colleague wrote the python script and I have successfully used it before on one file at a time. It takes two arguments: a path to the file to be processed enclosed in quotes and a secondary argument called -min which requires an integer. Also, the script writes to standard out.
From my experience of shell scripting and following others on this forum, I used the following method to iterate over the directory of files:
for f in path/to/data_directory/*; do
path/to/pythonscript.py $f -min 1 > path/to/out_directory/$f;
done
I get the desired file names in the out_directory. The content of each is something only the python script can write. That is, the above for loop successfully passes the files to the script. However, the nature of the content of each file is completely wrong (as in the computation the script does was wrong). When I run the python script on one of the files in the data_directory, the output file has the correct content (the computation performed by the script is correct).
The thing that makes it more complex is that the same shell method (the for loop) works perfectly in the Mac OS X my colleague has.
Where is the issue? Am I missing something very fundamental about Linux shells? Maybe it's a syntax error?
Any help will be appreciated.
Update: I just ran the for loop again but instead of pointing it to the data_directory of files, I pointed it to a file within the data_directory. I had the same problem - the python script did not compute the correct result.

The only problem I see is that filenames may contain white-space - so you should quote filenames:
for f in path/to/data_directory/*; do
path/to/pythonscript.py "$f" -min 1 > "path/to/out_directory/$f"
done

Well I don't know if this helps but.
path/to/pythonscript.py $f -min > path/to/out_director/$f
Substitutes out to
path/to/pythongscript.py path/to/data_directory/myfile -min 1 > path/out_directory/path/to/data_directory/myfile
script should be
cd path/to/data_directory
for f in *; do
path/to/pythonscript.py $f -min 1 > path/to/out_directory/$f
done
What version of bash are you running?
what do you get if you run this script?
cd path/to/data_directory
for f in *; do
echo $f > /tmp/$f
done
of course that should give you a bunch of files containing their own file names.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.