Using Ruffus library in Python 2.7, just_print flag fails

Using Ruffus library in Python 2.7, just_print flag fails - python

I've got a ruffus pipeline in Python 2.7, but when I call it with -n or --just_print it still runs all the actual tasks instead of just printing the pipeline like it's supposed to. I:
* don't have a -n argument that would supercede the built-in (although I do have other command-line arguments)
* have a bunch of functions with #transform() or #merge() decorators
* end the pipeline with a run_pipeline() call
Has anyone else experienced this problem? Many thanks!

As of ruffus version 2.4, you can use the builtin ruffus.cmdline which stores the appropriate flags via the cmdline.py module that uses argparse, for example:
from ruffus import *
parser = cmdline.get_argparse(description='Example pipeline')
options = parser.parse_args()
#originate("test_out.txt")
def run_testFunction(output):
with open(output,"w") as f:
f.write("it's working!\n")
cmdline.run(options)
Then run your pipeline from the terminal with a command like:
python script.py --verbose 6 --target_tasks run_testFunction --just_print
If you want to do this manually instead (which is necessary for older version of ruffus) you can call pipeline_printout() rather than pipeline_run(), using argparse so that the --just_print flag leads to the appropriate call, for example:
from ruffus import *
import argparse
import sys
parser = argparse.ArgumentParser(description='Example pipeline')
parser.add_argument('--just_print', dest='feature', action='store_true')
parser.set_defaults(feature=False)
args = parser.parse_args()
#originate("test_out.txt")
def run_testFunction(output):
with open(output,"w") as f:
f.write("it's working!\n")
if args.feature:
pipeline_printout(sys.stdout, run_testFunction, verbose = 6)
else:
pipeline_run(run_testFunction, verbose = 6)
You would then run the command like:
python script.py --just_print

Related

argparse not overriding default value

I managed to write some simple Python code using values from .csv to create an .svg file.
However argparse does not "pass" a value from command line - I'm not able to override the default.
I want to define [c]/[columns] with command line.
import argparse
parser = argparse.ArgumentParser("svg.py")
parser.add_argument("-c","--columns",help="number of columns (default=20)",default=20)
args = parser.parse_args()
when I run
svg.py -c 24
the value is still 20.

Maybe you have an error in argument passing. Here my working example
svg.py
import argparse
parser = argparse.ArgumentParser("svg.py")
parser.add_argument("-c","--columns",help="number of columns (default=20)",default=20)
args = parser.parse_args()
print(args.columns)
It runs correctly
$ > python svg.py -c 24
24
Hope this helps to figure out the problem

How do I go from a bash script that calls a python script to a python script calling a bash script?

I currently have the following piece of code in bash, now I want to do this in python as well. However the python script being called is very long and changing that to a function would be a very tedious task. How can I do this in python without modifying the script being called?
gfs15_to_am10.py $LAT $LON $ALT $GFS_CYCLE $FORECAST_HOUR \
> layers.amc 2>layers.err

you can use os lib.
import os
os.system("bash commands")
Two option to use parameters:
option A:
import os
LAT = ''
os.system(f"echo {LAT}")
option B:
use argparse lib to get parameters as script arguments:
argparse python

PySpark and argparse

How does one specify command line arguments using argparse for a PySpark script? I've been breaking my head over this one and I swear I can't find the solution anywhere else.
Here's my test script:
import argparse
from pyspark.sql import SparkSession
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--v1", "-a", type=int, default=2)
parser.add_argument("--v2", "-b", type=int, default=3)
args = vars(parser.parse_args())
spark = (SparkSession.builder
.appName("Test")
.master("local[*]")
.getOrCreate()
)
result = args['v1'] + args['v2']
return result
if __name__ == "__main__":
result = main()
print(result)
When I try running the file using spark-submit file.py --v1 5 --v2 4, I get an error as shown below:
[TerminalIPythonApp] CRITICAL | Bad config encountered during initialization:
[TerminalIPythonApp] CRITICAL | Unrecognized flag: '--v1'
However, when I don't specify the arguments (just spark-submit file.py), it does the sum correctly, using the default values 2 and 3 from the argument parser, and displays "5" as expected. So clearly it's reading the values from argparse correctly. What's going wrong with the command when I actually pass non-default values?
NOTE: Am using PySpark 2.4.4 and Python 3.6.
EDIT: Of course, I could just use sys.argv and be done with it, but argparse is so much better!

Based on the TerminalIPythonApp error message (similar to this one), Pyspark was trying to pass argparse arguments to ipython instead of python. To fix this, set the correct Spark environment as python3, not ipython.
Add/modify the lines in /path/to/pyspark/conf/spark-env.sh:
export SPARK_HOME=/home/user/spark-2.4.0-bin-hadoop2.7/
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3
export SPARK_YARN_USER_ENV="PYSPARK_PYTHON=python3"
This ensures that Pyspark looks for the python3 executable, following which argparse arguments should be read without any issues.

In subprocess, some commands work but not others from Mac terminal

I am trying to create a python script that runs a perl script on the Mac terminal. The popular 3D printer slicing engine, Slic3r, has the ability to use command line usage, which is written in Perl. I want to write a python script to automate some processes, which is the language I know best. If I type the commands I want to use directly into the terminal, it works as it should, however, if I try to use python's subprocess, it works for some commands but not others.
For example if I use my script to fetch the Slic3r version using the syntax outlined in the docs, it works correctly. This script works:
import os
import subprocess
os.system("cd Slic3r")
perl = "perl"
perl_script = "/Users/path/to/Slic3r/slic3r.pl"
params = "--version"
pl_script = subprocess.Popen([perl, perl_script, params], stdout=sys.stdout)
pl_script.communicate()
print 'done'
This returns:
1.3.0-dev
done
If I use a command such as --info (see Slic3r docs under repairing models for more info) using the same script I get:
In:
import os
import subprocess
os.system("cd Slic3r")
perl = "perl"
perl_script = "/Users/path/to/Slic3r/slic3r.pl"
params = "--info /Users/path/to/Desktop/STL_Files/GEAR.stl"
pl_script = subprocess.Popen([perl, perl_script, params], stdout=sys.stdout)
pl_script.communicate()
print 'done'
Out:
Unknown option: info /Users/path/to/Desktop/STL_Files/GEAR.stl
Slic3r 1.3.0-dev is a STL-to-GCODE translator for RepRap 3D printers
written by Alessandro Ranellucci <aar#cpan.org> - http://slic3r.org/
Usage: slic3r.pl [ OPTIONS ] [ file.stl ] [ file2.stl ] ...
From what I have researched, I suspect that there is some issue with the whitespace of a string being used as a argument. I have never used subprocess until attempting this project, so a simple syntax error could be likely.
I know that the Slic3r syntax is correct because it works perfectly if I type it directly into the terminal. Can anybody see what I am doing wrong?

subprocess.Popen accepts args as the first parameter. This can be either a string with the complete command (including parameters):
args = "perl /Users/path/to/Slic3r/slic3r.pl --info /Users/path/to/Desktop/STL_Files/GEAR.stl"
pl_script = subprocess.Popen(args, stdout=sys.stdout)
or a list consisting of the actual command and all its parameters (the actual command in your case is perl):
args = ["perl",
"/Users/path/to/Slic3r/slic3r.pl",
"--info",
"/Users/path/to/Desktop/STL_Files/GEAR.stl"]
pl_script = subprocess.Popen(args, stdout=sys.stdout)
The latter is preferred because it bypasses the shell and directly executes perl. From the docs:
args should be a sequence of program arguments or else a single
string. By default, the program to execute is the first item in args
if args is a sequence. If args is a string, the interpretation is
platform-dependent and described below. See the shell and executable
arguments for additional differences from the default behavior. Unless
otherwise stated, it is recommended to pass args as a sequence.
(emphasis mine)
The args list may of course be built with Python's standard list operations:
base_args = ["perl",
"/Users/path/to/Slic3r/slic3r.pl"]
options = ["--info",
"/Users/path/to/Desktop/STL_Files/GEAR.stl"]
args = base_args + options
args.append("--verbose")
pl_script = subprocess.Popen(args, stdout=sys.stdout)
Sidenote: You wrote os.system("cd Slic3r"). This will open a shell, change the directory in that shell, and then exit. Your Python script will still operate in the original working directory. To change it, use os.chdir("Slic3r") instead. (See here.)

you can also use shlex to break down the complex arguments expecially in mac or unix
more information here
https://docs.python.org/2/library/shlex.html#shlex.split
e.g.
import shlex, subprocess
args = "perl /Users/path/to/Slic3r/slic3r.pl --info /Users/path/to/Desktop/STL_Files/GEAR.stl"
#using shlex to break down the arguments
mac_arg=shlex.split(args)
#shlex.split will return all the arguments in a list
output
['perl', '/Users/path/to/Slic3r/slic3r.pl', '--info', '/Users/path/to/Desktop/STL_Files/GEAR.stl']
This can then further be used with Popen
p1=Popen(mac_arg)
Shlex main adavantage is that you dont need to worry about the commands , it will always split them in a manner accepted by Popen

How to call module written with argparse in iPython notebook

I am trying to pass BioPython sequences to Ilya Stepanov's implementation of Ukkonen's suffix tree algorithm in iPython's notebook environment. I am stumbling on the argparse component.
I have never had to deal directly with argparse before. How can I use this without rewriting main()?
By the by, this writeup of Ukkonen's algorithm is fantastic.

An alternative to use argparse in Ipython notebooks is passing a string to:
args = parser.parse_args()
(line 303 from the git repo you referenced.)
Would be something like:
parser = argparse.ArgumentParser(
description='Searching longest common substring. '
'Uses Ukkonen\'s suffix tree algorithm and generalized suffix tree. '
'Written by Ilya Stepanov (c) 2013')
parser.add_argument(
'strings',
metavar='STRING',
nargs='*',
help='String for searching',
)
parser.add_argument(
'-f',
'--file',
help='Path for input file. First line should contain number of lines to search in'
)
and
args = parser.parse_args("AAA --file /path/to/sequences.txt".split())
Edit: It works

Using args = parser.parse_args(args=[]) would solve execution problem.
or you can declare it as class format.
class Args:
data = './data/penn'
model = 'LSTM'
emsize = 200
nhid = 200
args=Args()

I've had a similar problem before, but using optparse instead of argparse.
You don't need to change anything in the original script, just assign a new list to sys.argv like so:
if __name__ == "__main__":
from Bio import SeqIO
path = '/path/to/sequences.txt'
sequences = [str(record.seq) for record in SeqIO.parse(path, 'fasta')]
sys.argv = ['-f'] + sequences
main()

If all arguments have a default value, then adding this to the top of the notebook should be enough:
import sys
sys.argv = ['']
(otherwise, just add necessary arguments instead of the empty string)

I ended up using BioPython to extract the sequences and then editing Ilya Steanov's implementation to remove the argparse methods.
import imp
seqs = []
lcsm = imp.load_source('lcsm', '/path/to/ukkonen.py')
for record in SeqIO.parse('/path/to/sequences.txt', 'fasta'):
seqs.append(record)
lcsm.main(seqs)
For the algorithm, I had main() take one argument, his strings variable, but this sends the algorithm a list of special BioPython Sequence objects, which the re module doesn't like. So I had to extract the sequence string
suffix_tree.append_string(s)
to
suffix_tree.append_string(str(s.seq))
which seems kind of brittle, but that's all I've got for now.

I face a similar problem in invoking argsparse, the string '-f' was causing this problem. Just removing that from sys.srgv does the trick.
import sys
if __name__ == '__main__':
if '-f' in sys.argv:
sys.argv.remove('-f')
main()

Clean sys.argv
import sys; sys.argv=['']; del sys
https://github.com/spyder-ide/spyder/issues/3883#issuecomment-269131039

Here is my code which works well and I won't worry about the environment changed.
import sys
temp_argv = sys.argv
try:
sys.argv = ['']
print(sys.argv)
args = argparse.parser_args()
finally:
sys.argv = temp_argv
print(sys.argv)

Suppose you have this small code in python:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-v", "--verbose", help="increase output verbosity",
action="store_true")
parser.add_argument("-v_1", "--verbose_1", help="increase output verbosity",
action="store_true")
args = parser.parse_args()
To write this code in Jupyter notebook write this:
import argparse
args = argparse.Namespace(verbose=False, verbose_1=False)
Note: In python, you can pass arguments on runtime but in the Jupyter notebook that will not be the case so be careful with the data types of your arguments.

If arguments passed by the iPython environment can be ignored (do not conflict with the specified arguments), then the following works like a charm:
# REPLACE args = parser.parse_args() with:
args, unknown = parser.parse_known_args()
From: https://stackoverflow.com/a/12818237/11750716

If you don't want to change any of the arguments and working mechanisms from the original argparse function you have written or copied.
To let the program work then there is a simple solution that works most of the time.
You could just install jupyter-argparser using the below command:
pip install jupyter_argparser
The codes work without any changes thanks to the maintainer of the package.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Ruffus library in Python 2.7, just_print flag fails - python

Related

argparse not overriding default value

How do I go from a bash script that calls a python script to a python script calling a bash script?

PySpark and argparse

In subprocess, some commands work but not others from Mac terminal

How to call module written with argparse in iPython notebook

Categories

Resources