I would like to loop over files using subprocess.run(), something like:
import os
import subprocess
path = os.chdir("/test")
files = []
for file in os.listdir(path):
if file.endswith(".bam"):
files.append(file)
for file in files:
process = subprocess.run("java -jar picard.jar CollectHsMetrics I=file", shell=True)
How do I correctly call the files?
shell=True is insecure if you are including user input in it. #eatmeimadanish's answer allows anybody who can write a file in /test to execute arbitrary code on your machine. This is a huge security vulnerability!
Instead, supply a list of command-line arguments to the subprocess.run call. You likely also want to pass in check=True – otherwise, your program would finish without an exception if the java commands fails!
import os
import subprocess
os.chdir("/test")
for file in os.listdir("."):
if file.endswith(".bam"):
subprocess.run(
["java", "-jar", "picard.jar", "CollectHsMetrics", "I=" + file], check=True)
Seems like you might be over complicating it.
import os
import subprocess
path = os.chdir("/test")
for file in os.listdir(path):
if file.endswith(".bam"):
subprocess.run("java -jar picard.jar CollectHsMetrics I={}".format(file), shell=True)
Related
I am trying to run a bash script multiple times on a cluster. The issue however is I need to grab certain file names to fill the command which I only know how to do via python.
Note:I want to run the the last line (the line that calls the script) in parallel in groups of like two. How can I do this?
I have thought of: outputting all commands to a .txt and catting that in parallel. However, I feel that it is not the most efficient.
Thank you for any help
The script looks like this:
#!/usr/bin/python
import os
import sys
cwd = os.getcwd()
for filename in os.listdir(cwd):
if "_2_" in filename:
continue
elif "_1_" in filename:
in1 = os.path.join(cwd, filename)
secondread = filename.replace("_1.fastq_1_trimmed.fq","_1.fastq_2_trimmed.fq")
in2 = os.path.join(cwd, secondread)
outrename = filename.replace("_1.fastq_1_trimmed.fq",".bam")
out = "/home/blmatsum/working/bamout/" + outrename
cmd = "bbmap.sh ref=/home/blmatsum/working/datafiles/sequence.phages_clustered.fna in={} in2={} out={}".format(in1,in2,out)
os.system(cmd)
an example of the command I want to run would be:
bbmap.sh ref=/home/working/datafiles/sequence.phages_clustered.fna in=/home/working/trimmed/SRR7077355_1.fastq_1_trimmed.fq in2=/home/working/trimmed/SRR7077355_1.fastq_2_trimmed.fq out=/home/working/bamout/SRR7077355.bam'
bbmap.sh ref=/home/working/datafiles/sequence.phages_clustered.fna in=/home/working/trimmed/SRR7077366_1.fastq_1_trimmed.fq in2=/home/working/trimmed/SRR7077366_1.fastq_2_trimmed.fq out=/home/working/bamout/SRR7077366.bam
I have a C file say, myfile.c.
Now to compile I am doing : gcc myfile.c -o myfile
So now to run this I need to do : ./myfile inputFileName > outputFileName
Where inputFileName and outputFileName are 2 command line inputs.
Now I am trying to execute this within a python program and I am trying this below approach but it is not working properly may be due to the >
import subprocess
import sys
inputFileName = sys.argv[1];
outputFileName = sys.argv[2];
subprocess.run(['/home/dev/Desktop/myfile', inputFileName, outputFileName])
Where /home/dev/Desktop is the name of my directory and myfile is the name of the executable file.
What should I do?
The > that you use in your command is a shell-specific syntax for output redirection. If you want to do the same through Python, you will have to invoke the shell to do it for you, with shell=True and with a single command line (not a list).
Like this:
subprocess.run(f'/home/dev/Desktop/myfile "{inputFileName}" > "{outputFileName}"', shell=True)
If you want to do this through Python only without invoking the shell (which is what shell=True does) take a look at this other Q&A: How to redirect output with subprocess in Python?
You can open the output file in Python, and pass the file object to subprocess.run().
import subprocess
import sys
inputFileName = sys.argv[1];
outputFileName = sys.argv[2];
with open(outputFileName, "w") as out:
subprocess.run(['/home/dev/Desktop/myfile', inputFileName], stdout=out)
I've been trying to figure this out for hours with no luck. I have a list of directories that have subdirectories and other files of their own. I'm trying to traverse through all of them and move all of their content to a specific location. I tried shutil and glob but I couldn't get it to work. I even tried to run shell commands using subprocess.call and that also did not work either. I understand that it didn't work because I couldn't apply it properly but I couldn't find any solution that moves all contents of a directory to another.
files = glob.glob('Food101-AB/*/')
dest = 'Food-101/'
if not os.path.exists(dest):
os.makedirs(dest)
subprocess.call("mv Food101-AB/* Food-101/", shell=True)
# for child in files:
# shutil.move(child, dest)
I'm trying to move everything in Food101-AB to Food-101
shutil module of the standart library is the way to go:
>>> import shutil
>>> shutil.move("Food101-AB", "Food-101")
If you don't want to move Food101-AB folder itself, try using this:
import shutil
import os
for i in os.listdir("Food101-AB"):
shutil.move(os.path.join("Food101-AB", i), "Food-101")
For more information about move function:
https://docs.python.org/3/library/shutil.html#shutil.move
Try to change call function to run in order to retrieve the stdout, stderr and return code for your shell command:
from subprocess import run, CalledProcessError
source_dir = "full/path/to/src/folder"
dest_dir = "full/path/to/dest/folder"
try:
res = run(["mv", source_dir, dest_dir], check=True, capture_output=True)
except CalledProcessError as ex:
print(ex.stdout, ex.stderr, ex.returncode)
So I have written a piece of code which first runs a powershell command to generate a UTF-8 version of a DAT file (have been having special character issues with the original file, hence the step). Following which I try to open the newly created file. But the issue is, I keep getting 'FileNotFoundError: [Errno 2]' Initially I was only trying with the file name since the newly created file was in the same folder, but then i tried to generate the absolute path as well.
import os
import subprocess
subprocess.Popen('powershell.exe -Command "Get-Content .\Own.DAT | Set-Content -Encoding utf8 Own1.dat"')
filepath = __file__
filepath = filepath[:-7]
with open(filepath+"Own1.dat", "r") as f:
I can confirm that filepath+"Own1.dat" is fetching the correct filepath. Yet can't figure out what the issue could be.
Edit: Someone asked for confirmation, here is the message i am getting:
C:\Users\Debojit\MiniConda3\python.exe "E:/My Documents/Work/essbase/ownership/test.py"
Traceback (most recent call last):
File "E:/My Documents/Work/essbase/ownership/test.py", line 18, in <module>
with open(filepath+"Own1.dat", "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'E:/My Documents/Work/essbase/ownership/Own1.dat'
Process finished with exit code 1
Note: Curiously enough if i put the powershell command into a separate batch file, write a code in the python script to run it, the works without any issues. Here is the code i am talking about:
import os
import subprocess
from subprocess import Popen
p = Popen("conversion.bat", cwd=r"E:\My Documents\Work\essbase\ownership")
stdout, stderr = p.communicate()
filepath = __file__
filepath = filepath[:-7]
with open(filepath+"Own1.dat", "r") as f:
The conversion.bat file contains the following
powershell.exe -Command "Get-Content .\Own.DAT | Set-Content -Encoding utf8 Own1.DAT"
But I don't want to include a separate batch file to go with the python script.
Any idea what might be causing the issue?
Your error is unrelated to powershell. Popen runs asynchronously. In one command, you are using communicate(), but in the other, you are not.
You're using Popen() incorrectly.
If you want run a command and also pass arguments to it, you have to pass them as a list, like so:
subprocess.Popen(['powershell.exe', '-Command', ...])
In your code, popen tries to run a command literally named powershell.exe -Command "Get-Content ... which of course doesn't exist.
To use a simpler example, this code won't work:
subprocess.Popen('ls -l')
because it's trying to run a command literally named ls -l.
But this does work:
subprocess.Popen(['ls', '-l'])
I still couldn't figure out why the error was happening. But I found a workaround
with open("conversion.bat","w") as f:
f.writelines("powershell.exe -Command \"Get-Content '" +fileName+ "' | Set-Content -Encoding utf8 Own1.dat\"")
from subprocess import Popen
p = Popen("conversion.bat", cwd=os.path.dirname(os.path.realpath(__file__)))
stdout, stderr = p.communicate()
os.remove("conversion.bat")
Basically I would create the batch file, run it and then delete it once the file has been created. Don't why I have to use this route, but it works.
I want my script to go under a particular file path mentioned in os.walk() and then execute a grep command on all the files under that location and redirect the output to a file. Below is the script I created, but the subprocess executes ls -al command under the current directory but the print statment show me the contents of os.walk. So I need the subprocess to execute the command under the os.walk path as well.
with open('ipaddressifle.out', 'w') as outfile:
for pdir, dir, files in os.walk(r'/Users/skandasa/perforce/projects/releases/portal-7651'):
for items in files:
print(items)
#subprocess.call(['ls', '-al'])
process = subprocess.Popen(['ls', '-al'], shell= True, stdout=outfile, stderr=outfile)
#process = subprocess.Popen(['grep', 'PORTALSHARED','*', '|', 'awk', '-F', '[','{print', '$1}'], shell= True, stdout=outfile, stderr=outfile)
[output, err] = process.communicate()
And is there anyother way apart from adding a cd command to the subprocess call.
You can use os.chdir(path) to change the current working directory.
I reworked your snippet to use subprocess.check_output to call a command and retrieve its stdout. I also use shlex.split(command) to write the command in a single string and split it correctly for Popen.
The script does os.walk(DIRECTORY) and write the output of ls -la in each subdirectory into OUTPUT_FILE:
import os
import shlex
from subprocess import check_output
DIRECTORY = '/tmp'
OUTPUT_FILE = '/tmp/output.log'
with open(OUTPUT_FILE, 'w') as output:
for parent, _, _ in os.walk(DIRECTORY):
os.chdir(parent)
output.write(check_output(shlex.split('ls -al')))