Python subprocess() reading file in bash - python

I have a shell command for a file as given below:
filename="/4_illumina/gt_seq/gt_seq_proccessor/200804_MN01111_0025_A000H35TCJ/fastq_files/raw_data/200804_MN01111_0025_A000H35TCJ.demultiplex.log"
assembled_reads=$(cat $filename | grep -i " Assembled reads ...................:" | grep -v "Assembled reads file...............:")
Now I am trying to run this within a python environment using subprocess as:
task = subprocess.Popen("cat $filename | grep -i " Assembled reads ...................:" | grep -v "Assembled reads file...............:"", shell=True, stdout=subprocess.PIPE)
p_stdout = task.stdout.read()
print (p_stdout)
This is not working becasue I am not able to parse the filename variable from python to shell and probably there is a syntax error in the way I have written the grep command.
Any suggestions ?

This code seems to solve your problem with no external tools required.
filename="/4_illumina/gt_seq/gt_seq_proccessor/200804_MN01111_0025_A000H35TCJ/fastq_files/raw_data/200804_MN01111_0025_A000H35TCJ.demultiplex.log"
for line in open(filename):
if "Assembled reads" in line and "Assembled reads file" not in line:
print(line.rstrip())

I would consider doing all the reading and searching in python and maybe rethink what you want to achieve, however:
In a shell:
$ export filename=/tmp/x-output.GOtV
In a Python (note the access to $filename and mixing quotes in the command, I also use custom grep command to simplify things a bit):
import os
import subprocess
tmp = subprocess.Popen(f"cat {os.environ['filename']} | grep -i 'x'", shell=True, stdout=subprocess.PIPE)
data = tmp.stdout.read()
print(data)
Though working, the solution is ... not what I consider a clean code.

Related

Python Subprocess: fails to read "|" in Shell Command

I'm trying to use Go binary as well some shell packages in a python script. It's a chain command using |, for summary the command would look like this:
address = "http://line.me"
commando = f"echo {address} | /root/go/bin/crawler | grep -E --line-buffered '^200'"
Above code is just a demonstration, where the actual code is reading address from a wordlist. First try using os.system, it fails.
read = os.system(commando)
print(read)
Turns out os.system doesnt transfer any std. I had to use subprocess:
commando=subprocess.Popen(commando,shell=True,stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
commandos = commando.stdout.read() + commando.stderr.read()
print(commandos)
Mentioning shell=True triggers:
b'/bin/sh: 1: Syntax error: "|" unexpected\n'
Trough more reading, it could be because sh can't read | or i need to use bash. Is there any alternative to this? have been trying to use shebang line in commando variable:
#!/bin/bash
Still doesn't push my luck...
Try this way:
subprocess.call(commando, shell=True)
the following code worked for me!
subprocess.call('ls | grep x | grep y', shell=True)
Fixed by explicitly mentioning bash:
['bash', '-c', commando]

subprocess command execution

What is the best way to execute the below command in Python in a single line?
echo $(readlink /sys/dev/block/$(mountpoint -d /))
Tried using individual os.system(cmd) by separating - "mountpoint -d /" first and taking the output and appending to "readlink /sys/dev/block/${0}".format(out.strip()) and doing an echo works. Tried using subprocess and subprocess.Popen and subprocess.check_output but it raises raise CalledProcessError
cmd = "echo $(readlink /sys/dev/block/$(mountpoint -d /))"
You have to call the subcommand separately. And you can use python methods to read the link:
import subprocess
import os
path = "/"
device = subprocess.run(["mountpoint", "-d", path], stdout=subprocess.PIPE, encoding="utf8").stdout.strip()
link = os.readlink("/sys/dev/block/" + device)
print(link)
You probably want to use something like the following:
cmd = "bash -c 'echo $(readlink /sys/dev/block/$(mountpoint -d /))'"
echo doesn't substitute $() blocks, that's what your shell does, so you have to call the shell. os.system(cmd) should work then.

Why do we lost stdout from a terminal after running subprocess.check_output(xyz, shell=True)?

I have this bash line:
$ printf ' Number of xml files: %s\n' `find . -name '*.xml' | wc -l`
Number of xml files: 4
$
When I run it from python in this way the python interpreter stop and my terminal does not have stdout anymore::
$ ls
input aa bb
$ python
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
>>>
>>> import subprocess
>>> cmd = "printf 'xml files: %s\n' `find . -name '*.xml' | wc -l`"
>>> subprocess.check_output(['/bin/bash', cmd], shell=True)
$ ls # stdout is not seen any more I have to kill this terminal
$
Obviously the question here is not how to make this bash work from python::
>>> import subprocess
>>> cmd = "printf 'xml files: %s\n' `find . -name '*.xml' | wc -l`"
>>> out = subprocess.run(cmd, shell=True, stdout=subprocess.PIPE)
>>> print(str(out.stdout, 'utf8'))
xml files: 4
>>>
The two following issues No output from subprocess.check_output() and Why is terminal blank after running python executable? does not answer the question
The short version is that check_output is buffering all the output to return. When you run ls, its standard output is going to check_output's buffer, not the terminal. When you exit the shell you are currently in, you'll get all the output at once as a single Python string.
This leads to the question, why are you getting a sub shell in the first place, instead of executing the contents of cmd? First, you are using bash wrong; its argument is a file to run, not an arbitrary command line. A more correct version of what you are doing would be
cmd = "printf 'xml files: %s\n' `find . -name '*.xml' | wc -l`"
subprocess.check_output(['/bin/bash', '-c', cmd])
or, if you want subprocess to run a shell for you, instead of explicitly executing it,
subprocess.check_output(cmd, shell=True)
Combining the list argument with shell=True is almost never what you want.
Second, given your original code, check_output first tries to combine your list into a single string, which is then joined to sh -c. That is, you try to execute something like
sh -c /bin/bash "printf 'xml files: %s\n' `find . -name '*.xml' | wc -l`"
sh runs /bin/bash, and your command string is just used as an additional argument to sh which, for the purposes of this question, we can assume is ignored. So you are in an interactive shell whose standard output is buffered instead of displayed, as described in the first part of this answer.

Inserting bash script in python code

I am trying to execute a bash script from python code. The bash script has some grep commands in a pipe inside a for loop. When I run the bash script itself it gives no errors but when I use it within the python code it says: grep:write error.
The command that I call in python is:
subprocess.call("./change_names.sh",shell=True)
The bash script is:
#!/usr/bin/env bash
for file in *.bam;do new_file=`samtools view -h $file | grep -P '\tSM:' | head -n 1 | sed 's/.\+SM:\(.\+\)/\1/' | sed 's/\t.\+//'`;rename s/$file/$new_file.bam/ $file;done
What am I missing?
You should not use shell=True when you are running a simple command which doesn't require the shell for anything in the command line.
subprocess_call(["./change_names.sh"])
There are multiple problems in the shell script. Here is a commented refactoring.
#!/usr/bin/env bash
for file in *.bam; do
# Use modern command substitution syntax; fix quoting
new_file=$(samtools view -h "$file" |
grep -P '\tSM:' |
# refactor to a single sed script
sed -n 's/.\+SM:\([^\t]\+\).*/\1/p;q')
# Fix quoting some more; don't use rename
mv "$file" "$new_file.bam"
done
grep -P doesn't seem to be necessary or useful here, but without an example of what the input looks like, I'm hesitant to refactor that into the sed script too. I hope I have guessed correctly what your sed version does with the \+ and \t escapes which aren't entirely portable.
This will still produce a warning that you are not reading all of the output from grep in some circumstances. A better solution is probably to refactor even more of this into your Python script.
import glob
for file in glob.glob('*.bam'):
new_name = subprocess.check_output(['samtools', 'view', '-h', file])
for line in new_name.split('\n'):
if '\tSM:' in line:
dest = line.split('\t')[0].split('SM:')[-1] + '.bam'
os.rename(file, dest)
break
Hi try with below modification which will fix your issue.
for file in *.bam;do new_file=`unbuffer samtools view -h $file | grep -P '\tSM:' | head -n 1 | sed 's/.\+SM:\(.\+\)/\1/' | sed 's/\t.\+//'`;rename s/$file/$new_file.bam/ $file;done
Or else try to redirect your standard error to dev/null like below
for file in *.bam;do new_file=`samtools view -h $file >2>/dev/null | grep -P '\tSM:' | head -n 1 | sed 's/.\+SM:\(.\+\)/\1/' | sed 's/\t.\+//'`;rename s/$file/$new_file.bam/ $file;done
Your actual issue is with this command samtools view -h $file While you are running the script from python you should provide a full path like below:-
/fullpath/samtools view -h $file

Why they are different if I run the command in linux directly and run it by importing os module in python?

I am doing two "similar" things:
(1) in python
import os
os.system('cat ...input | awk -f ...awk' -v seed=$RANDOM)
(2) in linux terminal
cat ...input | awk -f ...awk' -v seed=$RANDOM
Actually, my awk file will return a randomized input file, but if I run way(1) many times, the result always be same(only one result). But If I run way(2), then every time I can get a randomized file. What's wrong with it?
If I want to run this command in python, how should I do then?
Thank you so much for you answer.
EDIT:
Adding the actual code:
(1) in python
import os
os.system("cat data/MD-00001-00000100.input | awk -f utils/add_random_real_weights.awk -v seed=$RANDOM")
(2) in linux:
cat data/MD-00001-00000100.input | awk -f utils/add_random_real_weights.awk -v seed=$RANDOM

Categories

Resources