Run command in python - python

I am writing a python script which will iterate a directory and run a command based on the file,like this:
for root, directories,files in os.walk(directory):
for filename in files:
if filename.endswith('.xx'):
filepath=os.path.join(root,filename)
process_file(filepath)
def process_file(filepath):
#get the file name and extension
name,ext=os.path.splitext(os.path.basename(filepath))
#copy the file to a tmp file, because the filepath may contain some no-asci character
tmp_file_src='c:/'+uuid.uuid1()+'.xx'
tmp_file_dest=tmp_file_src.replace('.xx','.yy')
shutil.copy2(filepath,tmp_file_src)
#run the command which may take 10-20 seconds
os.system('xx %s %s' %(tmp_file_src,tmp_file_dest))
#copy the generated output file, reset the file name
shutil.copy2(tmp_file_dest,os.path.dirname(filepath)+'/'+name+'.yy')
As you can see, one file to one command, and I have to wait the command run completely to do the further job.
Not the execute process:
file1-->file2.....
I wonder if they can be executed parallelly?
file1
file2
....

There is even the possibility to use threading module.
import threading
def starter_function(cmd_to_execute):
os.system(cmd_to_execute)
execution_thread = threading.Thread(target=starter_function, args=(cmd_to_execute,))
execution_thread.start()

Related

How to execute a python script in bash in parallel

I am trying to run a bash script multiple times on a cluster. The issue however is I need to grab certain file names to fill the command which I only know how to do via python.
Note:I want to run the the last line (the line that calls the script) in parallel in groups of like two. How can I do this?
I have thought of: outputting all commands to a .txt and catting that in parallel. However, I feel that it is not the most efficient.
Thank you for any help
The script looks like this:
#!/usr/bin/python
import os
import sys
cwd = os.getcwd()
for filename in os.listdir(cwd):
if "_2_" in filename:
continue
elif "_1_" in filename:
in1 = os.path.join(cwd, filename)
secondread = filename.replace("_1.fastq_1_trimmed.fq","_1.fastq_2_trimmed.fq")
in2 = os.path.join(cwd, secondread)
outrename = filename.replace("_1.fastq_1_trimmed.fq",".bam")
out = "/home/blmatsum/working/bamout/" + outrename
cmd = "bbmap.sh ref=/home/blmatsum/working/datafiles/sequence.phages_clustered.fna in={} in2={} out={}".format(in1,in2,out)
os.system(cmd)
an example of the command I want to run would be:
bbmap.sh ref=/home/working/datafiles/sequence.phages_clustered.fna in=/home/working/trimmed/SRR7077355_1.fastq_1_trimmed.fq in2=/home/working/trimmed/SRR7077355_1.fastq_2_trimmed.fq out=/home/working/bamout/SRR7077355.bam'
bbmap.sh ref=/home/working/datafiles/sequence.phages_clustered.fna in=/home/working/trimmed/SRR7077366_1.fastq_1_trimmed.fq in2=/home/working/trimmed/SRR7077366_1.fastq_2_trimmed.fq out=/home/working/bamout/SRR7077366.bam

How to run a program in a python IDE given the file path to the program

I have a program that I need to run in a list of folders. I have the file path to where the program is and the list to all of the folders where I would like to run the program on my computer (there's about 200 of them). I can also change the current working directory to get to the folder I want to be in.
How do you get python to execute a program through the actual IDE once you are in the folder you want to run the program in? I don't want to have to manually open a terminal on my computer, type "[Program name] [Path to file where I want to run the program]" 200+ times. The code I have is below
cat = '/home/myname/catalogue.csv'
cat = Table.read(cat, format="ascii")
ID = np.array(cat['ID'])
ID = ID.astype(str)
folder_path = np.array([])
for i in ID:
folder_path = np.append(folder_path, '/home/myname/python_stuff/{}/'.format(i))
folder_path = folder_path[folder_path.argsort()]
for i in zip(folder_path, ID):
os.chdir(i[0])
name = i[1] +(".setup")
Essentially after the last line in my code I want another line that is the python equivalent of "run [program name] on name (which is the name of the file in each folder I want it to use)
You can run process as a subprocess with changing working directory
import subprocess
p = subprocess.Popen(["python","some_script.py"], stdout=subprocess.PIPE, cwd=PATH)
output = p.communicate()[0]
cwd means current working directory so you can run your script in another script on specific directory.
https://docs.python.org/3/library/subprocess.html#popen-constructor Check this guide to run subprocess with different directory
import subprocess
for i in zip(folder_path, ID):
os.chdir(i[0])
name = i[1] +(".setup")
subprocess.call("[path to program] %s" % (name) , shell=True)

How to use os.system to convert all files in a folder at once using external python script

I've managed to find out the method to convert a file from one file extension to another (.evtx to .xml) using an external script. Below is what I am using:
os.system("file_converter.py file1.evtx > file1.xml")
This successfully converts a file from .txt to .xml using the external script I called (file_converter.py).
I am now trying to find out a method on how I can use 'os.system' or perhaps another method to convert more than one file at once, I would like for my program to dive into a folder and convert all of the 10 files I have at once to .xml format.
The questions I have are how is this possible as os.system only takes 1 argument and I'm not sure on how I could make it locate through a directory as unlike the first file I converted was on my standard home directory, but the folder I want to access with the 10 files is inside of another folder, I am trying to find out a way to address this argument and for the conversion to be done at once, I also want the file name to stay the same for each individual file with the only difference being the '.xml' being changed from '.evtx' at the end.
The file "file_converter.py" is downloadable from here
import threading
import os
def file_converter(file):
os.system("file_converter.py {0} > {1}".format(file, file.replace(".evtx", ".xml")))
base_dir = "C:\\Users\\carlo.zanocco\\Desktop\\test_dir\\"
for file in os.listdir(base_dir):
threading.Thread(target=file_converter, args=(file,)).start()
Here my sample code.
You can generate multiple thread to run the operation "concurrently". The program will check for all files in the directory and convert it.
EDIT python2.7 version
Now that we have more information about what you want I can help you.
This program can handle multiple file concurrently from one folder, it check also into the subfolders.
import subprocess
import os
base_dir = "C:\\Users\\carlo.zanocco\\Desktop\\test_dir\\"
commands_to_run = list()
#Search all files
def file_list(directory):
allFiles = list()
for entry in os.listdir(directory):
fullPath = os.path.join(directory, entry)
#if is directory search for more files
if os.path.isdir(fullPath):
allFiles = allFiles + file_list(fullPath)
else:
#check that the file have the right extension and append the command to execute later
if(entry.endswith(".evtx")):
commands_to_run.append("C:\\Python27\\python.exe file_converter.py {0} > {1}".format(fullPath, fullPath.replace(".evtx", ".xml")))
return allFiles
print "Searching for files"
file_list(base_dir)
print "Running conversion"
processes = [subprocess.Popen(command, shell=True) for command in commands_to_run]
print "Waiting for converted files"
for process in processes:
process.wait()
print "Conversion done"
The subprocess module can be used in two ways:
subprocess.Popen: it run the process and continue the execution
subprocess.call: it run the process and wait for it, this function return the exit status. This value if zero indicate that the process terminate succesfully
EDIT python3.7 version
if you want to solve all your problem just implement the code that you share from github in your program. You can easily implement it as function.
import threading
import os
import Evtx.Evtx as evtx
import Evtx.Views as e_views
base_dir = "C:\\Users\\carlo.zanocco\\Desktop\\test_dir\\"
def convert(file_in, file_out):
tmp_list = list()
with evtx.Evtx(file_in) as log:
tmp_list.append(e_views.XML_HEADER)
tmp_list.append("<Events>")
for record in log.records():
try:
tmp_list.append(record.xml())
except Exception as e:
print(e)
tmp_list.append("</Events>")
with open(file_out, 'w') as final:
final.writelines(tmp_list)
#Search all files
def file_list(directory):
allFiles = list()
for entry in os.listdir(directory):
fullPath = os.path.join(directory, entry)
#if is directory search for more files
if os.path.isdir(fullPath):
allFiles = allFiles + file_list(fullPath)
else:
#check that the file have the right extension and append the command to execute later
if(entry.endswith(".evtx")):
threading.Thread(target=convert, args=(fullPath, fullPath.replace(".evtx", ".xml"))).start()
return allFiles
print("Searching and converting files")
file_list(base_dir)
If you want to show your files generate, just edit as above:
def convert(file_in, file_out):
tmp_list = list()
with evtx.Evtx(file_in) as log:
with open(file_out, 'a') as final:
final.write(e_views.XML_HEADER)
final.write("<Events>")
for record in log.records():
try:
final.write(record.xml())
except Exception as e:
print(e)
final.write("</Events>")
UPDATE
If you want to delete the '.evtx' files after the conversion you can simply add the following rows at the end of the convert function:
try:
os.remove(file_in)
except(Exception, ex):
raise ex
Here you just need to use try .. except because you run the thread only if the input value is a file.
If the file doesn't exist, this function throws an exception, so it's necessary to check os.path.isfile() first.
import os, sys
DIR = "D:/Test"
# ...or as a command line argument
DIR = sys.argv[1]
for f in os.listdir(DIR):
path = os.path.join(DIR, f)
name, ext = os.path.splitext(f)
if ext == ".txt":
new_path = os.path.join(DIR, f"{name}.xml")
os.rename(path, new_path)
Iterates over a directory, and changes all text files to XML.

How do I move a file from local to HDFS in Python?

I have a script to check a directory for files. If the right files (with keyword) is present, I want to move that/those file(s) to an HDFS location.
import os
tRoot = "/tmp/mike"
keyword = "test"
for root, dirs, files in os.walk(tRoot):
for file in files:
if keyword in file:
fullPath = str(os.path.join(root, file))
subprocess.call(['hdfs', 'dfs', '-copyFromLocal','/tmp/mike/test*','hdfs:///user/edwaeadt/app'], shell=True)
I'm seeing below error:
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
I also tried with
subprocess.call(['hadoop', 'fs', '-copyFromLocal', '/tmp/mike/test*', 'hdfs:///user/edwaeadt/app'], shell=True)
But I'm seeing
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME
Also, seems like this loop is running 3 times. Because I now see the file moved to hdfs location but I also see twice where it says file exists. Seems like this copyFromLocal is running 3 times. Any ideas?
If you are intent on using subprocess and shell=True then your command should read as
subprocess.call(['hadoop fs -copyFromLocal /tmp/mike/test* hdfs:///user/edwaeadt/app'], shell=True)

automation to run script files for number of definition files using python

I have an autojson file, where I need to run the file using the following command in the terminal:
python script.py -i input.json -o output.h
Now I want to run the same script for number of input files stored in a folder automatically and store the output in another folder. how can I write a python script to automate this?
For this to run I have to keep rewriting the input file names, instead the command should read the files from a given folder by itself and generate the files.
import os
import subprocess
myFiles = []
cwd = os.getcwd()
path_script = cwd+"\\script"
myFiles = os.listdir(path_script)
for script_file in myFiles:
x = os.path.join(path_def, filename)
y = filename.strip(".json")
os.system(script_file+" -i "+x+" -o "+y+".h")

Categories

Resources