How to send EOF using scala process utility - python

I want to start a python program from inside a scala program that has to receive a possibly infinitely long string. Thus it is not possible to pass it as a cmd argument.
My solution is to transmit the data via the stdstreams. However, I am unable to find the scala version of the working bash code:
bash code:
#/bin/bash
var="SOME REALLY LONG STRING THAT IS SEND TO THE PYTHON PROGRAM"
echo "$var" | ./readUntilEOF.py
scala code:
import sys.process._
object main {
def main(args : Array[String]) : Unit = {
val cmd = "./readUntilEOF.py"
val string = "SOME REALLY LONG STRING THAT IS SEND TO THE PYTHON PROGRAM"
print("I am starting to send stuff...")
val resultString = (string #| cmd.!!).!!
print(resultString)
}
}
readUntilEOF.py:
#!/usr/bin/python3
import sys
if __name__ == "__main__":
read = sys.stdin.read()
print(read)
Output running the bash command:
#> ./scalaBashEquivalent.sh
SOME REALLY LONG STRING THAT IS SEND TO THE PYTHON PROGRAM
Output running the scala code:
#> scala scala.sc
I am starting to send stuff...
/* and then it never terminates */

#< can take InputStream so try
(cmd #< new ByteArrayInputStream(string.getBytes)).!!
scastie

It is indeed a bit more complex than expected. But the below code seems to work.
import java.io.PrintWriter
object main {
def main(args : Array[String]) : Unit = {
val cmd = "./readUntilEOF.py"
val string = "SOME REALLY LONG STRING THAT IS SEND TO THE PYTHON PROGRAM"
println("I am starting to send stuff...")
val processOutput : StringBuilder = new StringBuilder()
val process = Process(cmd).run(new ProcessIO(
in => {
val writer = new PrintWriter(in)
writer.write(string)
writer.close()
},
out => {
processOutput.addAll(scala.io.Source.fromInputStream(out))
out.close()
},
_.close()
))
assert(process.exitValue() == 0)
print(processOutput.toString)
}
}

Related

Node.js child process to Python process

I must send text from a node.js child process to a python process.
My dummy node client looks like
var resolve = require('path').resolve;
var spawn = require('child_process').spawn;
data = "lorem ipsum"
var child = spawn('master.py', []);
var res = '';
child.stdout.on('data', function (_data) {
try {
var data = Buffer.from(_data, 'utf-8').toString();
res += data;
} catch (error) {
console.error(error);
}
});
child.stdout.on('exit', function (_) {
console.log("EXIT:", res);
});
child.stdout.on('end', function (_) {
console.log("END:", res);
});
child.on('error', function (error) {
console.error(error);
});
child.stdout.pipe(process.stdout);
child.stdin.setEncoding('utf-8');
child.stdin.write(data + '\r\n');
while the Python process master.py is
#!/usr/bin/env python
import sys
import codecs
if sys.version_info[0] >= 3:
ifp = codecs.getreader('utf8')(sys.stdin.buffer)
else:
ifp = codecs.getreader('utf8')(sys.stdin)
if sys.version_info[0] >= 3:
ofp = codecs.getwriter('utf8')(sys.stdout.buffer)
else:
ofp = codecs.getwriter('utf8')(sys.stdout)
for line in ifp:
tline = "<<<<<" + line + ">>>>>"
ofp.write(tline)
# close files
ifp.close()
ofp.close()
I must use a utf-8 encoded input reader so I'm using a sys.stdin, but it seems that when node.js writes to child process stdin using child.stdin.write(data + '\r\n');, this will not be read by sys.stdin in for line in ifp:
You'll need to call child.stdin.end() in the Node program after the final call to child.stdin.write(). Until end() is called, the child.stdin writable stream will hold the written data in a buffer, so the Python program won't see it. See the Buffering discussion in https://nodejs.org/docs/latest-v8.x/api/stream.html#stream_buffering for details.
(If you write lots of data into stdin then the write buffer will eventually fill to a point where the accumulated data will be flushed out automatically to the Python program. The buffer will then begin again to collect data. An end() call is needed to make sure that the final portion of the written data is flushed out. It also has the effect of indicating to the child process that no more data will be sent on this stream.)

Can I pass a list of lists from Node to python easily?

I know there is some issues with passing in more complicated data structures, such as a list of lists, to a python script via the CLI.
I was wondering if running a python script from node code had any of these same issues.
Basically, say I have the following code in a node app:
const spawn = require("child_process").spawn;
const pythonProcess = spawn('python',["path/to/script.py", arg1, arg2, arg3]);
Question the above code is from
Suppose that arg1 and arg2 are lists of lists in the node app. And suppose arg3 is a double.
The corresponding code in my script.py file that is meant to parse and receive these arguments into variables looks like so:
import sys
if __name__ == '__main__':
oc = sys.argv[1]
nc = sys.argv[2]
r = sys.argv[3]
Will oc and nc here be lists of lists in python? Or does something else need to be done to get this working?
The easiest way to pass complex structures is to serialize it first in some common data format, such as JSON:
const myList = ["foo", "bar", "baz"];
const { spawn } = require("child_process");
const python = spawn('python',["script.py", JSON.stringify(myList)]);
And deserialize on callee side:
import sys, json
if __name__ == '__main__':
my_list = json.loads(sys.argv[1])
But, instead of passing serialized params as callee arguments, better use stdout and stdin streams for interchanging data larger than a few hundreds of bytes:
const { spawn } = require("child_process");
const python = spawn('python', ["script.py"]);
const buffers = [];
python.stdout.on('data', (chunk) => buffers.push(chunk));
python.stdout.on('end', () => {
const result = JSON.parse(Buffer.concat(buffers));
console.log('Python process exited, result:', result);
});
python.stdin.write(JSON.stringify(["foo", "bar", "baz"]));
python.stdin.end()
And accept it from sys.stdin via json.load, which takes streams instead of strings:
import sys, json
if __name__ == '__main__':
my_list = json.load(sys.stdin)
json.dump(my_list, sys.stdout)

ImportError: No module named seqfmt

I am running a python script from Groovy via:
Process p = Runtime.getRuntime().exec("python /Users/afrieden/Projects/hgvs/hgvs/tests/test_gsg_variants.py");
String s = null;
BufferedReader stdInput = new BufferedReader(new InputStreamReader(p.getInputStream()));
BufferedReader stdError = new BufferedReader(new InputStreamReader(p.getErrorStream()));
System.out.println("Here is the standard output of the command:\n");
while ((s = stdInput.readLine()) != null) {
System.out.println(s);
}
// read any errors from the attempted command
System.out.println("Here is the standard error of the command (if any):\n");
while ((s = stdError.readLine()) != null) {
System.out.println(s);
}
However it calls what looks like a cython library seqfmt. (seqfmt.c and seqfmt.pyx).
I have added it in the sys import:
import sys
sys.path.append("/Users/afrieden/Projects/hgvs/build/lib/")
sys.path.append("/Users/afrieden/pythonLib/pygr-0.8.2/")
sys.path.append("/Users/afrieden/pythonLib/pygr-0.8.2/pygr/seqfmt.pyx")
sys.path.append("/Users/afrieden/pythonLib/pygr-0.8.2/pygr/seqfmt.c")
import hgvs
import csv
import hgvs.utils
from pygr.seqdb import SequenceFileDB
Any thoughts on how I can get it to run? Thanks!
EDIT:
It does work with python from the command line just fine.
Simplifying your script slightly, does this work:
def proc = [ 'bash', '-c', 'python /Users/afrieden/Projects/hgvs/hgvs/tests/test_gsg_variants.py' ].execute()
StringWriter out = new StringWriter()
StringWriter err = new StringWriter()
proc.waitForProcessOutput( out, err )
println 'Here is the standard output of the command:'
println out.toString()
println 'Here is the standard error of the command (if any):'
println err.toString()

Executing many sub processes in groovy fails

I need to create a script that calls an application (c++ binary) 4000 times. The application takes some arguments and for each call writes a zip file to disk. So when the script is executed 4000 zip files will be written to disk. The application supports multiple threads.
I first created a bash script that does the job and it works fine. But now I need the script to be platform independent. I have therefore tried to port the script to groovy, something like this:
for (int i = 1; i <= 4000; i++) {
def command = """myExecutable
a=$argA
b=$outDir"""
def proc = command.execute() // Call *execute* on the string
proc.waitFor() // Wait for the command to finish
// Obtain status and output
println "return code: ${ proc.exitValue()}"
println "stderr: ${proc.err.text}"
println "stdout: ${proc.in.text}" // *out* from the external program is *in* for groovy
println "iteration : " + i
}
But after 381 zipfiles have been written to disk the script just hangs. Do I need to close the process after each call or something similar?
Here:
http://groovy.codehaus.org/Process+Management
it says that its known that java.lang.Process might hang or deadlock. Is it no-go to do something like this in groovy?
I will also give it at try in python to see if it gives the same problems
It might be the output stream blocking:
(1..<4000).each { i ->
println "iteration : $i"
def command = """myExecutable
a=$argA
b=$outDir"""
def proc = command.execute()
// Consume the outputs from the process and pipe them to our output streams
proc.consumeProcessOutput( System.out, System.err )
// Wait for the command to finish
proc.waitFor()
// Obtain status
println "return code: ${proc.exitValue()}"
}
Yes, you should close streams belongs to process.
Or, as say #tim_yates you shoul use consumeProcessOutput, or, in concurent solution, waitForProcessOutput, which closes them for you.
For parallel computation you could use smth. like this:
import groovyx.gpars.GParsPool
GParsPool.withPool(8){ // Start in pool with 8 threads.
(1..4000).toList().eachParallel {
def p = "myExecutable a=$argA b=$outDir".execute()
def sout = new StringBuffer();
def serr = new StringBuffer();
p.waitForProcessOutput(sout, serr)
synchronized (System.out) {
println "return code: ${ p.exitValue()}"
println "stderr: $serr"
println "stdout: $sout"
println "iteration $it"
}
}
}

NSTask requires flush when reading from a process' stdout, Terminal does not

I have a simple Python script that asks for your name, then spits it back out:
def main():
print('Enter your name: ')
for line in sys.stdin:
print 'You entered: ' + line
Pretty simple stuff! When running this in the OS X Terminal, it works great:
$ python nameTest.py
Enter your name:
Craig^D
You entered: Craig
But, when attempting to run this process via an NSTask, the stdout only appears if additional flush() calls are added to the Python script.
This is how I have my NSTask and piping configured:
NSTask *_currentTask = [[NSTask alloc] init];
_currentTask.launchPath = #"/usr/bin/python";
_currentTask.arguments = [NSArray arrayWithObject:#"nameTest.py"];
NSPipe *pipe = [[NSPipe alloc] init];
_currentTask.standardOutput = pipe;
_currentTask.standardError = pipe;
dispatch_queue_t stdout_queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0);
__block dispatch_block_t checkBlock;
checkBlock = ^{
NSData *readData = [[pipe fileHandleForReading] availableData];
NSString *consoleOutput = [[NSString alloc] initWithData:readData encoding:NSUTF8StringEncoding];
dispatch_sync(dispatch_get_main_queue(), ^{
[self.consoleView appendString:consoleOutput];
});
if ([_currentTask isRunning]) {
[NSThread sleepForTimeInterval:0.1];
checkBlock();
} else {
dispatch_sync(dispatch_get_main_queue(), ^{
NSData *readData = [[pipe fileHandleForReading] readDataToEndOfFile];
NSString *consoleOutput = [[NSString alloc] initWithData:readData encoding:NSUTF8StringEncoding];
[self.consoleView appendString:consoleOutput];
});
}
};
dispatch_async(stdout_queue, checkBlock);
[_currentTask launch];
But when running the NSTask, this is how it appears (it is initially blank, but after entering my name and pressing CTRL+D, it finishes all at once):
Craig^DEnter your name:
You entered: Craig
So, my question is: How can I read the stdout from my NSTask without requiring the additional flush() statements in my Python script? Why does the Enter your name: prompt not appear immediately when run as an NSTask?
When Python sees that its standard output is a terminal, it arranges to automatically flush sys.stdout when the script reads from sys.stdin. When you run the script using NSTask, the script's standard output is a pipe, not a terminal.
UPDATE
There is a Python-specific solution to this. You can pass the -u flag to the Python interpreter (e.g. _currentTask.arguments = #[ #"-u", #"nameTest.py"];), which tells Python not to buffer standard input, standard output, or standard error at all. You can also set PYTHONUNBUFFERED=1 in the process's environment to achieve the same effect.
ORIGINAL
A more general solution that applies to any program uses what's called a “pseudo-terminal” (or, historically, a “pseudo-teletype”), which we shorten to just “pty”. (In fact, this is what the Terminal app itself does. It is a rare Mac that has a physical terminal or teletype connected to a serial port!)
Each pty is actually a pair of virtual devices: a slave device and a master device. The bytes you write to the master, you can read from the slave, and vice versa. So these devices are more like sockets (which are bidirectional) than like pipes (which are one-directional). In addition, a pty also let you set terminal I/O flags (or “termios”) that control whether the slave echoes its input, whether it passes on its input a line at a time or a character at a time, and more.
Anyway, you can open a master/slave pair easily with the openpty function. Here's a little category that you can use to make an NSTask object use the slave side for the task's standard input and output.
NSTask+PTY.h
#interface NSTask (PTY)
- (NSFileHandle *)masterSideOfPTYOrError:(NSError **)error;
#end
NSTask+PTY.m
#import "NSTask+PTY.h"
#import <util.h>
#implementation NSTask (PTY)
- (NSFileHandle *)masterSideOfPTYOrError:(NSError *__autoreleasing *)error {
int fdMaster, fdSlave;
int rc = openpty(&fdMaster, &fdSlave, NULL, NULL, NULL);
if (rc != 0) {
if (error) {
*error = [NSError errorWithDomain:NSPOSIXErrorDomain code:errno userInfo:nil];
}
return NULL;
}
fcntl(fdMaster, F_SETFD, FD_CLOEXEC);
fcntl(fdSlave, F_SETFD, FD_CLOEXEC);
NSFileHandle *masterHandle = [[NSFileHandle alloc] initWithFileDescriptor:fdMaster closeOnDealloc:YES];
NSFileHandle *slaveHandle = [[NSFileHandle alloc] initWithFileDescriptor:fdSlave closeOnDealloc:YES];
self.standardInput = slaveHandle;
self.standardOutput = slaveHandle;
return masterHandle;
}
#end
You can use it like this:
NSTask *_currentTask = [[NSTask alloc] init];
_currentTask.launchPath = #"/usr/bin/python";
_currentTask.arguments = #[[[NSBundle mainBundle] pathForResource:#"nameTest" ofType:#"py"]];
NSError *error;
NSFileHandle *masterHandle = [_currentTask masterSideOfPTYOrError:&error];
if (!masterHandle) {
NSLog(#"error: could not set up PTY for task: %#", error);
return;
}
Then you can read from the task and write to the task using masterHandle.

Categories

Resources