i am running a spark script from a bash wrapper which is being triggerd from another python script using subprocess. The issue is even though job is running fine , I don't see the spark logs in my logfile.
here is the code which actually runs the bash wrapper which includes the pyspark job.
def run_bash_cmd(self, cmd, errormsg="FAILED : ", max_retries=3, encoding=None):
logging.info("COMMAND : %s", cmd)
attempt = 1
while attempt <= max_retries:
try:
command = subprocess.Popen(
cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True
)
data, error = command.communicate()
if command.poll() != 0:
logging.warning("Attempt %s failed. Retry in 10 seconds.", attempt)
attempt += 1
Time.sleep(10)
continue
return data.decode() if encoding is None else data.decode(encoding)
except subprocess.CalledProcessError as e:
logging.warning("Attempt %s failed. Retry in 10 seconds", attempt)
attempt += 1
Time.sleep(10)
# spark-submit writes error to stdout rather than stderr so logging stdout as well in case of error
if encoding is None:
logging.error(data.decode())
else:
logging.error(data.decode(encoding))
logging.error(error.decode("utf-8"))
msg = "{0}\nError while running command '{1}' on {2}".format(
errormsg, cmd, self.server
)
how can I improve this so that i can also see the spark logs in real time rather and buffer and print it
I'm trying to use Python asyncio subprocesses to start an interactive SSH session and automatically input the password. The actual use case doesn't matter but it helps illustrate my problem. This is my code:
proc = await asyncio.create_subprocess_exec(
'ssh', 'user#127.0.0.1',
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT,
stdin=asyncio.subprocess.PIPE,
)
# This loop could be replaced by async for, I imagine
while True:
buf = await proc.stdout.read()
if not buf:
break
print(f'stdout: { buf }')
I expected it to work something like asyncio streams, where I can create two tasks/subroutines/futures, one to listen to the StreamReader (in this case given by proc.stdout), the other to write to StreamWriter (proc.stdin).
However, it doesn't work as expected. The first few lines of output from the ssh command are printed directly to the terminal, until it gets to the password prompt (or host key prompt, as the case may be) and waits for manual input. I expected to be able to read the first few lines, check whether it was asking for password or the host prompt, and write to the StreamReader accordingly.
The only time it runs the line print(f'stdout: { buf }') is after I press enter, when it prints, obviously, that "stderr: b'Host key verification failed.\r\n'".
I also tried the recommended proc.communicate(), which isn't as neat as using StreamReader/Writer, but it has the same problem: Execution freezes while it waits for manual input.
How is this actually supposed to work? If it's not how I imagined, why not, and is there any way to achieve this without resorting to some sort of busy loop in a thread?
PS: I'm explaining using ssh just for clarity. I ended up using plink for what I wanted, but I want to understand how to do this with python to run arbitrary commands.
This isn't a problem specific to asyncio. The ssh process does not interact with the stdin and stdout streams, but rather accesses the TTY device directly, in order to ensure that password entry is properly secured.
You have three options to work around this:
Don't use ssh, but some other SSH client, one that doesn't expect to a TTY to control. For asyncio, you could use the asyncssh library. This library directly implements the SSH protocol and so doesn't require a separate process, and it accepts username and password credentials directly.
Provide a pseudo-tty for SSH to talk to, one your Python program controls. The pexpect library provides a high-level API that does this for you and can be used to fully control the ssh command.
Set up an alternative password prompter for ssh to use. The ssh program can let something else handle password entry if there is no TTY, via the SSH_ASKPASS environment variable. Most versions of ssh are quite picky about when they'll accept SSH_ASKPASS however, you need to set DISPLAY too, use the -n command-line switch for ssh and use the setsid command to run ssh in a new session, disconnected from any TTY.
I've previously described how to use SSH_ASKPASS with asyncio in an answer to a question about git and ssh.
The path of least resistance is to use pexpect, as it supports asyncio natively (any method that accepts async_=True can be used as a coroutine):
import pexpect
proc = pexpect.spawn('ssh user#127.0.0.1')
await child.expect('password:', timeout=120, async_=True)
child.sendline(password_for_user)
If anyone else landed here for a more generic answer to the question, see the following example:
import asyncio
async def _read_stream(stream, cb):
while True:
line = await stream.readline()
if line:
cb(line)
else:
break
async def _stream_subprocess(cmd, stdout_cb, stderr_cb):
process = await asyncio.create_subprocess_exec(*cmd,
stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE)
await asyncio.wait([
_read_stream(process.stdout, stdout_cb),
_read_stream(process.stderr, stderr_cb)
])
return await process.wait()
def execute(cmd, stdout_cb, stderr_cb):
loop = asyncio.get_event_loop()
rc = loop.run_until_complete(
_stream_subprocess(
cmd,
stdout_cb,
stderr_cb,
))
loop.close()
return rc
if __name__ == '__main__':
print(execute(
["bash", "-c", "echo stdout && sleep 1 && echo stderr 1>&2 && sleep 1 && echo done"],
lambda x: print("STDOUT: %s" % x),
lambda x: print("STDERR: %s" % x),
))
Here demonstration of live output.
Briefly, run bash process -> with stdin pass an 'ls' command -> async read result from the stdout
proc = await asyncio.create_subprocess_exec(
'/bin/bash', '-i',
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT,
stdin=asyncio.subprocess.PIPE,
)
proc.stdin.write(b'ls \r\n')
await proc.stdin.drain()
try:
while True:
# wait line for 3 seconds or raise an error
line = await asyncio.wait_for( proc.stdout.readline(), 3 )\
print(line)
except asyncio.TimeoutError:
pass
Using this technique I was not able to enter server with ssh and "password",
I stacked with the error "bash: no job control in this shell" after command 'ssh -tt user#localhost '
Have you tried using AsyncSSH library? (which uses python's asyncio framework). Seems like this is what you're looking for.
import asyncio, asyncssh, sys
async def run_client():
async with asyncssh.connect('localhost', username='myuser', password='secretpw') as conn:
result = await conn.run('ls abc', check=True)
print(result.stdout, end='')
try:
asyncio.get_event_loop().run_until_complete(run_client())
except (OSError, asyncssh.Error) as exc:
sys.exit('SSH connection failed: ' + str(exc))
It also has support for ssh keys with client_keys param. Check the documentation. There are many examples for interactive input, i/o redirect, etc.
I've been trying to write a python script to control the starting and stopping of a minecraft server. I've got it to accept commands through input() but i also wanted the logs of the server to be printed on the console(or be processed someway), since that the process never ends, readline hangs everytime the server finished outputing text,no further input can be performed. Is there a way to let stdin and stdout to work simultaneously,or is there a way to time out readline so i can continue?
The code i've got so far:
import subprocess
from subprocess import PIPE
import os
minecraft_dir = "D:\Minecraft Server"
executable = 'java -Xms4G -Xmx4G -jar "D:\Minecraft Server\paper-27.jar" java'
process = None
def server_command(cmd):
if(process is not None):
cmd = cmd + '\n'
cmd = cmd.encode("utf-8")
print(cmd)
process.stdin.write(cmd)
process.stdin.flush()
else:
print("Server is not running.")
def server_stop():
if process is None:
print("Server is not running.")
else:
process.stdin.write("stop\n".encode("utf-8"))
process.stdin.flush()
while True:
command=input()
command=command.lower()
if(command == "start"):
if process is None:
os.chdir(minecraft_dir)
process = subprocess.Popen(executable,stdin=PIPE,stdout=PIPE)
print("Server started.")
else:
print("Server Already Running.")
elif(command == "stop"):
server_stop()
process=None
else:
server_command(command)
I've mentioned processing the server log someway or the other because i don't really need it on the console,since i can always read from a file that it generated. But this particular server i'm running needs the stdout=PIPE argument or it throws out
java.io.IOException: ReadConsoleInputW failed
at org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:816)
at org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:99)
at org.jline.terminal.impl.jansi.win.JansiWinSysTerminal.processConsoleInput(JansiWinSysTerminal.java:112)
at org.jline.terminal.impl.AbstractWindowsTerminal.pump(AbstractWindowsTerminal.java:458)
at java.lang.Thread.run(Unknown Source)
and i think it breaks the PIPE? Because no further input is directed to the process(process.stdin.write not working), yet the process is still running.
Any help on either one of the issue would be greatly appreciated.
Running Python 3.7.3 on Windows,
I have a situation where an asyncio event loop never breaks out of a process spawn from multiprocessing. I can't show all the code, but it's like this:
I use multiprocessing to speed up queries using an third-party
API.
This API thirdparty.api supports a server-client architecture and uses asyncio
event loop internally. It runs an event loop in a separate thread; in that thread, it calls event_loop.run_forever() and breaks only on KeyboardInterrupt.
Run the worker script using multiprocessing, the API always returns, be it success or failure. Previously I hit a Py3.7.2 regression where on Windows the venv Python executable works in a bad way https://bugs.python.org/issue35797. But now that is fixed in Py3.7.3 and my problem persists.
Running this script from another Py27 script using subprocess. Inside my multiprocessing worker process, if the query failed, the call never returns and it couldn't break out of the worker process naturally, even a generic exception handler won't catch anything and will get stuck.
code snippets of my caller script:
#!/usr/bin/env python2
def main()
try:
cmd = ['C:\\Python\\Python37\\pythonw.exe', 'worker.py']
print(' '.join(cmd))
proc = subprocess.Popen(cmd, shell=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
out, err = proc.communicate()
except subprocess.CalledProcessError as err:
print(err)
except Exception:
traceback.print_exc()
else:
print('try popen finished with else.')
print('stdout: {}'.format(out))
print('stderr: {}'.format(err))
if __name__ == '__main__':
main()
Pseudo-code snippets of my worker worker.py function look like this:
#!/usr/bin/env python3
args = [
...
]
def worker(*mpargs):
with thirdparty.api() as myclient:
try:
myclient.query(*args)
except Exception:
traceback.print_exc()
def exception_worker(*mpargs)
raise RuntimeError('Making trouble!')
def main():
logging.info('STARTED!')
with multiprocessing.Pool(processes=multiprocessing.cpu_count()) as pool:
results = pool.map(worker, args)
# results = pool.map(exception_worker, args)
pool.close()
pool.join()
logging.info('ALL DONE!')
if __name__ == '__main__':
main()
thirdparty.api starts event loop in its constructor:
self.loop = asyncio.get_event_loop()
if self.loop.is_closed():
self.loop = asyncio.new_event_loop()
asyncio.set_event_loop(self.loop)
then in its separate thread:
try:
self._loop.run_forever()
except KeyboardInterrupt:
pass
self.loop.close()
I've tried another worker exception_worker which just throws exceptions, and this one returns without issues.
How should I solve this?
After detailing the issue, I finally found the solution in this post:
Why am I getting NotImplementedError with async and await on Windows?
The thirdparty.api needed to attend to this detail and after fixing it, my problem is gone.
This is really interesting.
I have following scripts on my linux machine:
sleep.py
import time
from datetime import datetime
print(datetime.now())
time.sleep(20)
print('Over!')
print(datetime.now())
loop.py
import time
for i in range(20):
time.sleep(1)
print(i)
I can terminate them directly by ctrl+c if I login through PuTTY or git-bash.
But when I trying to run the Python scripts on Windows console:
test.py
def ssh_pty_command(cmd, ip, username, passwd=None, key_filename=None):
"""run ssh.exec_command with realtime output and return exit_code."""
try:
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
logging.debug('Connecting to remote server {}...'.format(ip))
ssh.connect(ip, 22, username, password=passwd,
key_filename=key_filename, timeout=5)
stdin, stdout, stderr = ssh.exec_command(cmd, get_pty=True)
logging.info('ssh_In: {}'.format(cmd))
# print '$: {}'.format(cmd)
for line in iter(stdout.readline, ""):
logging.info('ssh_Out: {}'.format(
line.rstrip('\n').encode('utf-8')))
for err in iter(stderr.readline, ""):
logging.error('ssh_Error: {}'.format(
err.rstrip().encode('utf-8')))
exit_code = stdout.channel.recv_exit_status()
logging.debug('Task exit with code {}.\n'.format(exit_code))
return exit_code
except Exception as err:
logging.error('*** Caught SSH exception: %s: %s' %
(err.__class__, err))
raise
finally:
ssh.close()
ssh_pty_command('python loop.py',ip,username)
ssh_pty_command('python sleep.py',ip,username)
When I press ctrl+c , the loop.py terminated immediately, but the sleep.py waits until the time.sleep(20) is finished and then terminate the execution.
How can I terminate the sleep.py immediately?
Note I did try to use get_pty=True in my exec_command method in my function, but it didn't help.
I guess it should have something to do with the signal sent by Paramiko, but not sure where to dig in...
Ctrl+C signals an interrupt. Python interpreter checks for the interrupt regularly, and raises KeyboardInterrupt exception, when it detects one.
When Paramiko is waiting for an incoming data on a socket, the checking for interrupts is probably suspended (just guessing). So if a remote command is not producing any output, you cannot break the local script.
Your loop.py produces an output using print(i), so your local Python script can be interrupted at the moment it's processing the output.
In any case, it's not the remote script that cannot be interrupted, it's the local script. So it has probably nothing to do with time.sleep as such.
See also:
Stopping python using ctrl+c
https://docs.python.org/3/library/exceptions.html#KeyboardInterrupt
If you actually do not want to wait for the command to finish, your question is not really about Python (nor Paramiko), but about Linux. See Execute remote commands, completely detaching from the ssh connection.