How would the output of one script be passed as the input to another? For example if a.py outputs format.xml then how would a.py call b.py and pass it the argument format.xml? I think it's supposed to work like piping done on the command line.
I've been hired by a bunch of scientists with domain specific knowledge but sometimes there computer programming requirements don't make sense. There's a long chain of "modules" and my boss is really adamant about 1 module being 1 python script, and the output of one module is the input of the next. I'm very new to Python, but if this design pattern rings a bell to anyone let me know.
Worse yet the project is to be converted to executable format (using py2exe) and there still has to be the same number of executable files as .py files.
The pattern makes sense in some cases, but for me it's when you want to be able to run each module as a self sustained executeable.
I.E. Should you want to use the script from within FORTRAN or similar language, it is the easiest way, to build the python module to an executeable, and then call it from FORTRAN.
That would not mean that one module is pr definition 1 python file, just that it only has one entry point, and is in fact executeable.
The one module pr script, could be to make it easier to locate the code. Or to mail it to someone for code inspection or peer review (done often in scientific communities)
So the requirements may be a mix of technical and social requirements.
Anyway back to the problem.
I would use the subprocess module to call the next module. (with close_fds set to true)
If close_fds is true, all file descriptors except 0, 1 and 2 will be
closed before the child process is executed. (Unix only). Or, on
Windows, if close_fds is true then no handles will be inherited by the
child process. Note that on Windows, you cannot set close_fds to true
and also redirect the standard handles by setting stdin, stdout or
stderr.
To emulate a | b shell pipeline in Python:
#!/usr/bin/env python
from subprocess import check_call
check_call('a | b', shell=True)
a program writes to its stdout stream and knows nothing about b program. b program reads from its stdin and knows nothing about a program.
A more flexible approach is to define functions, classes in a.py, b.py modules that work with objects and to implement the command-line interface that produces/consumes xml in terms of these functions in if __name__ == "__main__" block e.g., a.py:
#!/usr/bin/env python
import sys
import xml.etree.ElementTree as etree
def items():
yield {'name': 'a'}
yield {'name': 'b'}
def main():
parent = etree.Element("items")
for item in items():
etree.SubElement(parent, 'item', attrib=item)
etree.ElementTree(parent).write(sys.stdout) # set encoding="unicode" on Python 3
if __name__=="__main__":
main()
It would allow to avoid unnecessary serialization to xml/deserialization from xml when the scripts are not called from the command line:
#!/usr/bin/env python
import a, b
for item in a.items():
b.consume(item)
Note: item can be an arbitrary Python object such as dict or an instance of a custom class.
Related
Probably related to globals and locals in python exec(), Python 2 How to debug code injected by the exec block and How to get local variables updated, when using the `exec` call?
I am trying to develop a test framework for our desktop applications which uses click bot like functions.
My goal was to enable non-programmers to write small scripts which could work as a test script. So my idea is to structure the test scripts by files like:
tests-folder
| -> 01-first-test.py
| -> 02-second-test.py
| ... etc
| -> fixture.py
And then just execute these scripts in alphabetical order. However, I also wanted to have fixtures which would define functions, classes, variables and make them available to the different scripts without having the scripts to import that fixture explicitly. If that works, I could also have that approach for 2 or more directory levels.
I could get it working-ish with some hacking around, but I am not entirely convinced. I have a test_sequence.py which looks like this:
from pathlib import Path
from copy import deepcopy
from my_module.test import Failure
def run_test_sequence(test_dir: str):
error_occured = False
fixture = {
'some_pre_defined_variable': 'this is available in all scripts and fixtures',
'directory_name': test_dir,
}
# Check if fixture.py exists and load that first
fixture_file = Path(dir) / 'fixture.py'
if fixture_file.exists():
with open(fixture_file.absolute(), 'r') as code:
exec(code.read(), fixture, fixture)
# Go over all files in test sequence directory and execute them
for test_file in sorted(Path(test_dir).iterdir()):
if test_file.name == 'fixture.py':
continue
# Make a deepcopy, so scripts cannot influence one another
fixture_copy = deepcopy(fixture)
fixture_copy.update({
'some_other_variable': 'this is available in all scripts but not in fixture'
})
try:
with open(test_file.absolute(), 'r') as code:
exec(code.read(), fixture_locals, fixture_locals)
except Failure:
error_occured = True
return error_occured
This iterates over all files in the directory tests-folder and executes them in order (with fixture.py first). It also makes the local variables, functions and classes from fixture.py available to each test-script.
A test script could then just be arbitrary code that will be executed and if it raises my custom Failure exception, this will be noted as a failed test.
The whole sequence is started with a script that does
from my_module.test_sequence import run_test_sequence
if __name__ == '__main__':
exit(run_test_sequence('tests-folder')
This mostly works.
What it cannot do, and what leaves me unsatisfied with this approach:
I cannot debug the scripts itself. Since the code is loaded as string and then interpreted, breakpoints inside the test scripts are not recognized.
Calling fixture functions behaves weird. When I define a function in fixture.py like:
from my_hello_module import print_hello
def printer():
print_hello()
I will receive a message during execution that print_hello is undefined because the variables/modules/etc. in the scope surrounding printer are lost.
Stacktraces are useless. On failure it shows the stacktrace but of course only shows my line which says `exec(...)' and the insides of that function, but none of the code that has been loaded.
I am sure there are other drawbacks, that I have not found yet, but these are the most annoying ones.
I also tried to find a solution through __import__ but I couldn't get it to inject my custom locals or globals into the imported script.
Is there a solution, that I am too inexperienced to find or another builtin Python function that actually does, what I am trying to do? Or is there no way to achieve this and I should rather have each test-script import the fixture and file/directory names from the test-scripts itself. I want those scripts to have as few dependencies and pythony code as possible. Ideally they are just:
from my_module.test import *
click(x, y, LEFT)
write('admin')
press('tab')
write('password')
press('enter')
if text_on_screen('Login successful'):
succeed('Test successful')
else:
fail('Could not login')
Additional note: I think I had the debugger working when I still used execfile, but it is not available in python3 environments.
There is a python script start_test.py.
There is a second python script siple_test.py.
# pseudo code:
start_test.py --calls--> subprocess(python.exe simple_test.py, args_simple_test[])
The python interpreter for both scripts is the same. So instead of opening a new instance, I want to run simple_test.py directly from start_test.py. I need to preserve the sys.args environment. A nice to have would be to actually enter following code section in simple_test.py:
# file: simple_test.py
if __name__ == '__main__':
some_test_function()
Most important is, that the way should be a universal one, not depending on the content of the simple_test.py.
This setup would provide two benefits:
The call is much less resource intensive
The whole stack of simple_test.py can be debugged with pycharm
So, how do I execute the call of a python script, from a python script, without starting a new subprocess?
"Executing a script" is a somewhat blurry term.
Typically the if __name__== "__main__": part does the argument (sys.argv) decoding and then calls a worker function with explicit parameters. For clarity: It should not do anything else, since this additional work can't be called without creating a new process causing all the overhead you are trying to avoid.
You simply bypass that and call this implementing routine directly.
So you end up with start_test.py containing something like:
from simple_test import worker
# ...
worker(typed_arg1, typed_arg2)
I wish to write a python script for that needs to do task 'A' and task 'B'. Luckily there are existing Python modules for both tasks, but unfortunately the library that can do task 'A' is Python 2 only, and the library that can do task 'B' is Python 3 only.
In my case the libraries are small and permissively-licensed enough that I could probably convert them both to Python 3 without much difficulty. But I'm wondering what is the "right" thing to do in this situation - is there some special way in which a module written in Python 2 can be imported directly into a Python 3 program, for example?
The "right" way is to translate the Py2-only module to Py3 and offer the translation upstream with a pull request (or equivalent approach for non-git upstream repos). Seriously. Horrible hacks to make py2 and py3 packages work together are not worth the effort.
I presume you know of tools such as 2to3, that aim to make the job of porting code to py3k easier, just repeating it here for others' reference.
In situations where I have to use libraries from python3 and python2, I've been able to work around it using the subprocess module. Alternatively, I've gotten around this issue with shell scripts that pipes output from the python2 script to the python3 script and vice-versa. This of course covers only a tiny fraction of use cases, but if you're transferring text (or maybe even picklable objects) between 2 & 3, it (or a more thought out variant) should work.
To the best of my knowledge, there isn't a best practice when it comes to mixing versions of python.
I present to you an ugly hack
Consider the following simple toy example, involving three files:
# py2.py
# file uses python2, here illustrated by the print statement
def hello_world():
print 'hello world'
if __name__ == '__main__':
hello_world()
# py3.py
# there's nothing py3 about this, but lets assume that there is,
# and that this is a library that will work only on python3
def count_words(phrase):
return len(phrase.split())
# controller.py
# main script that coordinates the work, written in python3
# calls the python2 library through subprocess module
# the limitation here is that every function needed has to have a script
# associated with it that accepts command line arguments.
import subprocess
import py3
if __name__ == '__main__':
phrase = subprocess.check_output('python py2.py', shell=True)
num_words = py3.count_words(phrase)
print(num_words)
# If I run the following in bash, it outputs `2`
hals-halbook: toy hal$ python3 controller.py
2
In my python file, I have made a GUI widget that takes some inputs from user. I have imported a python module in my python file that takes some input using raw_input(). I have to use this module as it is, I have no right to change it. When I run my python file, it ask me for the inputs (due to raw_input() of imported module). I want to use GUI widget inputs in that place.
How can I pass the user input (that we take from widget) as raw_input() of imported module?
First, if importing it directly into your script isn't actually a requirement (and it's hard to imagine why it would be), you can just run the module (or a simple script wrapped around it) as a separate process, using subprocess or pexpect.
Let's make this concrete. Say you want to use this silly module foo.py:
def bar():
x = raw_input("Gimme a string")
y = raw_input("Gimme another")
return 'Got two strings: {}, {}'.format(x, y)
First write a trivial foo.wrapper.py:
import foo
print(foo.bar())
Now, instead of calling foo.do_thing() directly in your real script, run foo_wrapper as a child process.
I'm going to assume that you already have the input you want to send it in a string, because that makes the irrelevant parts of the answer simpler (in fact, it makes them possible—if you wanted to use some GUI code for that, there's really no way I could show you how unless you first tell us which GUI library you're using).
So:
foo_input = 'String 1\nString 2\n'
with subprocess.Popen([sys.executable, 'foo_wrapper.py'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE) as p:
foo_output, _ = p.communicate(foo_input)
Of course in real life you'll want to use an appropriate path for foo_wrapper.py instead of assuming that it's in the current working directory, but this should be enough to illustrate the idea.
Meanwhile, if "I have no right to change it" just means "I don't (and shouldn't) have checkin rights to the foo project's github site or the relevant subtree on our company's P4 server" or whatever, there's a really easy answer: Fork it, and change the fork.
Even if it's got a weak copyleft license like LGPL: fork it, change the fork, publish your fork under the same license as the original, then use your fork.
If you're depending on the foo package being installed on every target system, and can't depend on your replacement foo being installed instead, that's a bit more of a problem. But if the function or method that actually calls raw_input is just a small fraction of the actual code in foo, you can fix that by monkeypatching foo at runtime.
And that leads to the last-ditch possibility: You can always monkeypatch raw_input itself.
Again, I'm going to assume that you already have the input you need to give it to make things simpler.
So, first you write a replacement function:
foo_input = ['String 1\n', 'String 2\n']
def fake_raw_input(prompt):
global foo_input
return foo_input.pop()
Now, there are two ways you can patch this in. Usually, you want to do this:
import foo
foo.raw_input = fake_raw_input
This means any code in foo that calls raw_input will see the function you crammed into its module globals instead of the normal builtin. Unless it does something really funky (like looking up the builtin directly and copying it to a local variable or something), this is the answer.
If you need to handle one of those really funky edge cases, and you don't mind doing something questionable, you can do this:
import __builtin__
__builtin__.raw_input = fake_raw_input
You must do this before the first import foo anywhere in your problem. Also, it's not clear whether this is intentionally guaranteed to work, accidentally guaranteed to work (and should be fixed in the future), or not guaranteed to work. But it does work (at least for CPython 2.5-2.7, which is what you're probably using).
I am building an application plugin in Python which allows users to arbitrarily extend the application with simple scripts (working under Mac OS X). Executing Python scripts is easy, but some users are more comfortable with languages like Ruby.
From what I've read, I can easily execute Ruby scripts (or other arbitrary shell scripts) using subprocess and capture their output with a pipe; that's not a problem, and there's lots of examples online. However, I need to provide the script with multiple variables (say a chunk of text along with some simple boolean information about the text the script is modifying) and I'm having trouble figuring out the best way to do this.
Does anyone have a suggestion for the best way to accomplish this? My goal is to provide scripts with the information they need with the least required code needed for accessing that information within the script.
Thanks in advance!
See http://docs.python.org/library/subprocess.html#using-the-subprocess-module
args should be a string, or a sequence
of program arguments. The program to
execute is normally the first item in
the args sequence or the string if a
string is given, but can be explicitly
set by using the executable argument.
So, your call can look like this
p = subprocess.Popen( args=["script.sh", "-p", p_opt, "-v", v_opt, arg1, arg2] )
You've put arbitrary Python values into the args of subprocess.Popen.
If you are going to be launching multiple scripts and need to pass the same information to each of them, you might consider using the environment (warning, I don't know Python, so the following code most likely sucks):
#!/usr/bin/python
import os
try:
#if environment is set
if os.environ["child"] == "1":
print os.environ["string"]
except:
#set environment
os.environ["child"] = "1"
os.environ["string"] = "hello world"
#run this program 5 times as a child process
for n in range(1, 5):
os.system(__file__)
One approach you could take would be to use json as a protocol between parent and child scripts, since json support is readily available in many languages, and is fairly expressive. You could also use a pipe to send an arbitrary amount of data down to the child process, assuming your requirements allow you to have the child scripts read from standard input. For example, the parent could do something like (Python 2.6 shown):
#!/usr/bin/env python
import json
import subprocess
data_for_child = {
'text' : 'Twas brillig...',
'flag1' : False,
'flag2' : True
}
child = subprocess.Popen(["./childscript"], stdin=subprocess.PIPE)
json.dump(data_for_child, child.stdin)
And here is a sketch of a child script:
#!/usr/bin/env python
# Imagine this were written in a different language.
import json
import sys
d = json.load(sys.stdin)
print d
In this trivial example, the output is:
$ ./foo12.py
{u'text': u'Twas brillig...', u'flag2': True, u'flag1': False}