pyspark logs not getting logged when its been triggered using python process

pyspark logs not getting logged when its been triggered using python process - python

i am running a spark script from a bash wrapper which is being triggerd from another python script using subprocess. The issue is even though job is running fine , I don't see the spark logs in my logfile.
here is the code which actually runs the bash wrapper which includes the pyspark job.
def run_bash_cmd(self, cmd, errormsg="FAILED : ", max_retries=3, encoding=None):
logging.info("COMMAND : %s", cmd)
attempt = 1
while attempt <= max_retries:
try:
command = subprocess.Popen(
cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True
)
data, error = command.communicate()
if command.poll() != 0:
logging.warning("Attempt %s failed. Retry in 10 seconds.", attempt)
attempt += 1
Time.sleep(10)
continue
return data.decode() if encoding is None else data.decode(encoding)
except subprocess.CalledProcessError as e:
logging.warning("Attempt %s failed. Retry in 10 seconds", attempt)
attempt += 1
Time.sleep(10)
# spark-submit writes error to stdout rather than stderr so logging stdout as well in case of error
if encoding is None:
logging.error(data.decode())
else:
logging.error(data.decode(encoding))
logging.error(error.decode("utf-8"))
msg = "{0}\nError while running command '{1}' on {2}".format(
errormsg, cmd, self.server
)
how can I improve this so that i can also see the spark logs in real time rather and buffer and print it

Related

How do I get exit code after exec_command?

I have a program that runs from my local computer and connects via SSH (paramiko package) to a Linux computer.
I use the following functions to send a command and get an exit_code to make sure it's done.
For some reason, sometimes an exit code is returned, whereas sometimes the code enters an endless loop.
Does anyone know why this happens and how to make it stable?
def check_on_command(self, stdin, stdout, stderr):
if stdout is None:
raise Exception("Tried to check command before it was ready")
if not stdout.channel.exit_status_ready():
return None
else:
return stdout.channel.recv_exit_status()
def run_command(self, command):
(stdin, stdout, stderr) = self.client.exec_command(command)
logger.info(f"Excute command: {command}")
while self.check_on_command(stdin, stdout, stderr) is None:
time.sleep(5)
logger.info(f'Finish running, exit code: {stdout.channel.recv_exit_status()}')

In case you're using Python version >= 3.6, I advise working with an asynchronous library, that provides await capabilities for optimized run times and more manageable simple code.
For example, you can use asyncssh library that comes with python and does the job as requested. In general writing async code that uses sleeps to wait for a task to be executed should be replaced like so.
import asyncio, asyncssh, sys
async def run_client():
async with asyncssh.connect('localhost') as conn:
result = await conn.run('ls abc')
if result.exit_status == 0:
print(result.stdout, end='')
else:
print(result.stderr, end='', file=sys.stderr)
print('Program exited with status %d' % result.exit_status,
file=sys.stderr)
try:
asyncio.get_event_loop().run_until_complete(run_client())
except (OSError, asyncssh.Error) as exc:
sys.exit('SSH connection failed: ' + str(exc))
You can find further documentation here: asyncssh

subprocess.CalledProcessError: returned non-zero exit status 1 for non-pingable destination

I am writing a python script to calculate packet loss through ping an IP address using subprocess module in linux. More than one IP address kept in CSV file. It is running fine when the pingable destination are only given.
But throwing an error when the non-pingable IP given in the CSV file and then the script is exiting without checking the other IP address in that CSV file. So I am not able to capture the packet loss for the non-pingable destination which is the main purpose the script.
Please suggest a way forward.
subprocess.check_output(['ping','-c 4',hostname], shell=False,
universal_newlines=True).splitlines()
subprocess.CalledProcessError: Command '['ping', '-c 4', '192.168.134.100']' returned non-zero exit status 1

It is just that subprocess returns an error if your ping has 100% packet loss, destination unreachable or any other problem. What you could do is:
try:
# subprocess code here
except:
# some code here if the destination is not pingable, e.g. print("Destination unreachable..") or something else
pass # You need pass so the script will continue on even after the error

Try this Code:
import subprocess
def systemCommand(Command):
Output = ""
Error = ""
try:
Output = subprocess.check_output(Command,stderr = subprocess.STDOUT,shell='True')
except subprocess.CalledProcessError as e:
#Invalid command raises this exception
Error = e.output
if Output:
Stdout = Output.split("\n")
else:
Stdout = []
if Error:
Stderr = Error.split("\n")
else:
Stderr = []
return (Stdout,Stderr)
#in main
Host = "ip to ping"
NoOfPackets = 2
Timeout = 5000 #in milliseconds
#Command for windows
Command = 'ping -n {0} -w {1} {2}'.format(NoOfPackets,Timeout,Host)
#Command for linux
#Command = 'ping -c {0} -w {1} {2}'.format(NoOfPackets,Timeout,Host)
Stdout,Stderr = systemCommand(Command)
if Stdout:
print("Host [{}] is reachable.".format(Host))
else:
print("Host [{}] is unreachable.".format(Host))

Capturing standard out from a Paramiko command

I have a wrapper around Paramiko's SSHClient.exec_command(). I'd like to capture standard out. Here's a shortened version of my function:
def __execute(self, args, sudo=False, capture_stdout=True, plumb_stderr=True,
ignore_returncode=False):
argstr = ' '.join(pipes.quote(arg) for arg in args)
channel = ssh.get_transport().open_session()
channel.exec_command(argstr)
channel.shutdown_write()
# Handle stdout and stderr until the command terminates
captured = []
def do_capture():
while channel.recv_ready():
o = channel.recv(1024)
if capture_stdout:
captured.append(o)
else:
sys.stdout.write(o)
sys.stdout.flush()
while plumb_stderr and channel.recv_stderr_ready():
sys.stderr.write(channel.recv_stderr(1024))
sys.stderr.flush()
while not channel.exit_status_ready():
do_capture()
# We get data after the exit status is available, why?
for i in xrange(100):
do_capture()
rc = channel.recv_exit_status()
if not ignore_returncode and rc != 0:
raise Exception('Got return code %d executing %s' % (rc, args))
if capture_stdout:
return ''.join(captured)
paramiko.SSHClient.execute = __execute
In do_capture(), whenever channel.recv_ready() tells me that I can receive data from the command's stdout, I call channel.recv(1024) and append the data to my buffer. I stop when the command's exit status is available.
However, it seems like more stdout data comes at some point after the exit status.
# We get data after the exit status is available, why?
for i in xrange(100):
do_capture()
I can't just call do_capture() once, as it seems like channel.recv_ready() will return False for a few milliseconds, and then True, and more data is received, and then False again.
I'm using Python 2.7.6 with Paramiko 1.15.2.

I encountered the same problem. The problem is that after the command exited there may still be data on the stout or stderr buffers, still on its way over the network, or whatever else. I read through paramiko's source code and apparently all data's been read once chan.recv() returns empty string.
So this is my attempt to solve it, until now it's been working.
def run_cmd(ssh, cmd, stdin=None, timeout=-1, recv_win_size=1024):
'''
Run command on server, optionally sending data to its stdin
Arguments:
ssh -- An instance of paramiko.SSHClient connected
to the server the commands are to be executed on
cmd -- The command to run on the remote server
stdin -- String to write to command's standard input
timeout -- Timeout for command completion in seconds.
Set to None to make the execution blocking.
recv_win_size -- Size of chunks the output is read in
Returns:
A tuple containing (exit_status, stdout, stderr)
'''
with closing(ssh.get_transport().open_session()) as chan:
chan.settimeout(timeout)
chan.exec_command(cmd)
if stdin:
chan.sendall(stdin)
chan.shutdown_write()
stdout, stderr = [], []
# Until the command exits, read from its stdout and stderr
while not chan.exit_status_ready():
if chan.recv_ready():
stdout.append(chan.recv(recv_win_size))
if chan.recv_stderr_ready():
stderr.append(chan.recv_stderr(recv_win_size))
# Command has finished, read exit status
exit_status = chan.recv_exit_status()
# Ensure we gobble up all remaining data
while True:
try:
sout_recvd = chan.recv(recv_win_size)
if not sout_recvd and not chan.recv_ready():
break
else:
stdout.append(sout_recvd)
except socket.timeout:
continue
while True:
try:
serr_recvd = chan.recv_stderr(recv_win_size)
if not serr_recvd and not chan.recv_stderr_ready():
break
else:
stderr.append(serr_recvd)
except socket.timeout:
continue
stdout = ''.join(stdout)
stderr = ''.join(stderr)
return (exit_status, stdout, stderr)

I encountered the same issue.
This link (Paramiko: how to ensure data is received between commands) gave me some help, in explaining that after you get exit_status_ready() you still have to receive possible additional data. In my tests (with a couple of screens of output), in every single run, there will be additional data to read after exit_status_ready() returns True.
But the way it reads the remaining data it is not correct: it uses recv_ready() to check if there is something to read, and once recv_ready() returns False it exits. Now, it will work most of the time. But the following situation can happen: recv_ready() can return False to indicate that at that moment there is nothing to receive, but it doesn't mean that it is the end of the all data. In my tests, I would leave the test running, and sometimes it would take half an hour for the issue to appear.
I found the solution by reading the following sentence in the Channel.recv() documentation: "If a string of length zero is returned, the channel stream has closed."
So we just can have a single loop and read all the data until recv() returns zero length result. At that point channel stream is closed, but just to make sure that exit status is ready we can make additional loop and sleep until channel.exit_status_ready() returns True.
Note that this will work only with a channel without pty enabled (which is default).

ping command doesn't return when pinging a remote server that is down

I have a function in a python script that basically checks whether a remote server is up or not using ping. If it is not up then it should wait till it is up and only then return. But the script doesn't return if the remote server is down, if the remote server is up then the script returns.
I have tried both subprocess and os.system, I also tried with various parameters in ping command like -c, -w and -W but nothing seems to help. Any ideas on what I might be doing wrong?
Here is the code:
def waitTillUp():
command = "ping -c 1 -W 2 " + remoteServer
response = os.system(command)
if response == 0:
print "UP\n"
else:
print "Down\n"
'''
args = shlex.split(command)
p = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = p.communicate()
logging.debug("Waiting for the Remote Server to be up")
while "4 packets transmitted, 4 received" not in output:
logging.debug("Waiting for the remoteServer to be up")
p = subprocess.Popen(args, stdout=subprocess.PIPE)
output, err = p.communicate()
'''
Ideally it should loop till it is up but just for the sake of checking whether it is returning or not I have just put an if else condition. When the remote server does actually come up, the script is still stuck and not stopping. Once I hit an enter in the window, it returns. Any suggestions are welcome.
Update# 2
Right now I am just trying to do this, not checking for any response or anything. I just hope to let the program fall through to main and then end of program but it is not even doing that.
def waitTillUp():
command = "ping -c 1 " + Params.storageArray
response = os.system(command)
I am using lot of subprocess.Popen calls one after the other, so is it possible that some buffer is not getting cleared or something of similar sort? Can this be the reason for the weird behavior?
Update# 3
The problem is probably in the reboot call of the remote server before the code that I have pasted. I changed few things and realized that at the function where I am trying to do a reboot, at that point the function is not returning.
This is the complete code. With logging statements I am able to determine that the call for reboot, below marked as the "CULPRIT CALL" is the point where the execution is getting stuck and doesn't proceed till it gets a ENTER key from the user.
def waitTillUp():
command = "ping -c 1 " + remoteServer
response = os.system(command)
def execCmd(op, command):
logging.info("Executing %s operation(command: %s)" %(op, command))
args = shlex.split(command)
sys.stdout.flush()
p = subprocess.Popen(args, stdout=subprocess.PIPE)
output, err = p.communicate()
logging.debug("Output of %s operation: %s" %(op, output))
def install():
execCmd("chmod", "ssh root#" + remoteServer + " chmod +x ~/OS*.bin")
execCmd("reboot", "ssh root#" + remoteServer + " reboot -nf") ### CULPRIT CALL
waitTillUp()
def main():
install()

Python Loop synchronization

I am calling an external process multiple times, in a loop. To give you a pseudocode:
for i in xrange(1, 100):
call external proc which inserts a row into a table
The problem here is, whenever the external process is called, it runs in a seperate thread, which could take any amount of time to run. So, python would have continued with the execution. This causes the insertion to run into a row lock and prevent insertion.
What is the ideal way to wait for the process to complete, under the following constraints:
I cannot modify the way the external process works.
I know I can, but I do not want to use a hack, like thread.sleep
I cannot modify any DB settings.
The code for calling the external proc is:
def run_query(query, username, password):
try:
process = subprocess.Popen( "<path to exe> -u " + username + " -p "+ password +" " + query,
shell = True,
stdout = subprocess.PIPE,
stderr = subprocess.PIPE )
result, error = process.communicate()
if error != '':
_pretty_error('stderr', error)
except OSError, error:
_pretty_error('OSError', str(error))
return result

You have several options according to the subprocess documentation:
Calling process.wait() after running process = subprocess.Popen(...)
Using subprocess.call instead of Popen
Using subprocess.check_call instead of Popen

depending on how the result looks, one way would be to use wait():
process = subprocess.Popen( "<path to exe> -u " + username + " -p "+ password +" " + query,
shell = True,
stdout = subprocess.PIPE,
stderr = subprocess.PIPE )
retcode = process.wait()

You can try to start the process like:
process = subprocess.call( ["<path to exe>", "-u", "username", "-p", password, query],
shell = False)
This way the main process sleeps until the subprocess ends, but you don't get output.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pyspark logs not getting logged when its been triggered using python process - python

Related

How do I get exit code after exec_command?

subprocess.CalledProcessError: returned non-zero exit status 1 for non-pingable destination

Capturing standard out from a Paramiko command

ping command doesn't return when pinging a remote server that is down

Python Loop synchronization

Categories

Resources