Capturing sqoop logs using stdout Popen - python

Below is the python code I am running to call sqoop, But this is not capturing the logs except the below few lines
Warning: /usr/hdp/2.6.4.0-91/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
import subprocess
job = "sqoop-import --direct --connect 'jdbc:sqlserver://host' --username myuser --password-file /user/ivr_sqoop --table data_app_det --delete-target-dir --verbose --split-by attribute_name_id --where \"db_process_time BETWEEN ('2018-07-15') and ('9999-12-31')\""
print job
with open('save.txt','w') as fp:
proc = subprocess.Popen(job, stdout=fp, stderr=subprocess.PIPE, shell=True)
stdout, stderr = proc.communicate()
print "Here is the return code :: " + str(proc.returncode)
print stdout`
Please let me know if there is an issue with the way I am calling.
Note : The individual sqoop cmd is running fine and producing all the logs.
I have tried the below way as well, the result is the same
import subprocess
job = "sqoop-import --direct --connect 'jdbc:sqlserver://host' --username myuser --password-file /user/ivr_sqoop --table data_app_det --delete-target-dir --verbose --split-by attribute_name_id --where \"db_process_time BETWEEN ('2018-07-15') and ('9999-12-31')\""
proc = subprocess.Popen(job, stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
stdout, stderr = proc.communicate()
and also using '2> mylog.log' at the end of the cmd
import subprocess
job = "sqoop-import --direct --connect 'jdbc:sqlserver://host' --username myuser --password-file /user/ivr_sqoop --table data_app_det --delete-target-dir --verbose --split-by attribute_name_id --where \"db_process_time BETWEEN ('2018-07-15') and ('9999-12-31')\" > mylog.log "
proc = subprocess.Popen(job, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
stdout, stderr = proc.communicate()
I have found the below similar question but there was no answer there as well.
Subprocess Popen : Ignore Accumulo warning and continue execution of Sqoop

Since you have added shell=True, it is not capturing Sqoop logs. Please remove shell=True from your command and add universal_newlines=True, it will display the console log.
The working piece of code:
import subprocess
import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)
# Function to run Hadoop command
def run_unix_cmd(args_list):
"""
run linux commands
"""
print('Running system command: {0}'.format(' '.join(args_list)))
proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
s_output, s_err = proc.communicate()
s_return = proc.returncode
return s_return, s_output, s_err
# Create Sqoop Job
def sqoop_job():
"""
Create Sqoop job
"""
cmd = ['sqoop', 'import', '--connect', 'jdbc:oracle:thin:#//host:port/schema', '--username', 'user','--password', 'XX', '--query', '"your query"', '-m', '1', '--target-dir', 'tgt_dir']
print(cmd)
(ret, out, err) = run_unix_cmd(cmd)
print(ret, out, err)
if ret == 0:
logging.info('Success.')
else:
logging.info('Error.')
if __name__ == '__main__':
sqoop_job()

Related

How can I use run instead of communicate when providing text on stdin?

trying to figure out how to do this:
command = f"adb -s {i} shell"
proc = Popen(command, stdin=PIPE, stdout=PIPE)
out, err = proc.communicate(f'dumpsys package {app_name} | grep version'.encode('utf-8'))
but in this:
command = f"adb -s {i} shell"
proc = run(command, stdin=PIPE, stdout=PIPE, shell=True)
out, err = run(f'dumpsys package {app_name} | grep version', shell=True, text=True, stdin=proc.stdout )
The idea is to make a command which require input of some kind( for example(entering shell)) and afterwards inserting another command to shell.
I've found a way online with communicate, But I wonder how to do it with run() func.
Thanks!
You only need to call run once -- pass the remote command in the input argument (and don't use shell=True in places where you don't need it).
import subprocess, shlex
proc = subprocess.run(['adb', '-s', i, 'shell'],
capture_output=True,
input=f'dumpsys package {shlex.quote(app_name)} | grep version')
shlex.quote prevents an app name that contains $(...), ;, etc from running unwanted commands on your device.

Live stdout output from Python subprocess in Jupyter notebook

I'm using subprocess to run a command line program from a Python (3.5.2) script, which I am running in a Jupyter notebook. The subprocess takes a long time to run and so I would like its stdout to be printed live to the screen in the Jupyter notebook.
I can do this no problem in a normal Python script run from the terminal. I do this using:
def run_command(cmd):
from subprocess import Popen, PIPE
import shlex
with Popen(shlex.split(cmd), stdout=PIPE, bufsize=1, universal_newlines=True) as p:
for line in p.stdout:
print(line, end='')
exit_code = p.poll()
return exit_code
However, when I run the script in a Jupyter notebook, it does not print the stdout live to the screen. Instead, it prints everything after the subprocess has finished running.
Does anyone have any ideas on how to remedy this?
Many thanks,
Johnny
The ipython notebook has it's own support for running shell commands. If you don't need to capture with subprocess stuff you can just do
cmd = 'ls -l'
!{cmd}
Output from commands executed with ! is automatically piped through the notebook.
If you set stdout = None (this is the default, so you can omit the stdout argument altogether), then your process should write its output to the terminal running your IPython notebook server.
This happens because the default behavior is for subprocess to inherit from the parent file handlers (see docs).
Your code would look like this:
from subprocess import Popen, PIPE
import shlex
def run_command(cmd):
p = Popen(shlex.split(cmd), bufsize=1, universal_newlines=True)
return p.poll()
This won't print to the notebook in browser, but at least you will be able to see the output from your subprocess asynchronously while other code is running.
Hope this helps.
Jupyter mucks with stdout and stderr. This should get what you want, and give you a more useful exception when the command fails to boot.
import signal
import subprocess as sp
class VerboseCalledProcessError(sp.CalledProcessError):
def __str__(self):
if self.returncode and self.returncode < 0:
try:
msg = "Command '%s' died with %r." % (
self.cmd, signal.Signals(-self.returncode))
except ValueError:
msg = "Command '%s' died with unknown signal %d." % (
self.cmd, -self.returncode)
else:
msg = "Command '%s' returned non-zero exit status %d." % (
self.cmd, self.returncode)
return f'{msg}\n' \
f'Stdout:\n' \
f'{self.output}\n' \
f'Stderr:\n' \
f'{self.stderr}'
def bash(cmd, print_stdout=True, print_stderr=True):
proc = sp.Popen(cmd, stderr=sp.PIPE, stdout=sp.PIPE, shell=True, universal_newlines=True,
executable='/bin/bash')
all_stdout = []
all_stderr = []
while proc.poll() is None:
for stdout_line in proc.stdout:
if stdout_line != '':
if print_stdout:
print(stdout_line, end='')
all_stdout.append(stdout_line)
for stderr_line in proc.stderr:
if stderr_line != '':
if print_stderr:
print(stderr_line, end='', file=sys.stderr)
all_stderr.append(stderr_line)
stdout_text = ''.join(all_stdout)
stderr_text = ''.join(all_stderr)
if proc.wait() != 0:
raise VerboseCalledProcessError(proc.returncode, cmd, stdout_text, stderr_text)
Replacing the for loop with the explicit readline() call worked for me.
from subprocess import Popen, PIPE
import shlex
def run_command(cmd):
with Popen(shlex.split(cmd), stdout=PIPE, bufsize=1, universal_newlines=True) as p:
while True:
line = p.stdout.readline()
if not line:
break
print(line)
exit_code = p.poll()
return exit_code
Something is still broken about their iterators, even 4 years later.

Subprocess.Popen spits output on screen even with stdout=subprocess.PIPE)

I'm using multiple commands to run:
e.g. cd foo/bar; ../../run_this -arg1 -arg2="yeah_ more arg1 arg2" arg3=/my/path finalarg
Running with:
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
(out, err) = p.communicate()
But this spits output on screen (Python 2.7.5)
And out is empty string.
You have shell=True, so you're basically reading the standard output of the shell spawned, not the standard output of the program you want to run.
I'm guessing you're using shell=True to accommodate the directory changing. Fortunately, subprocess can take care of that for you (by passing a directory via the cwd keyword argument):
import subprocess
import shlex
directory = 'foo/bar'
cmd = '../../run_this -arg1 -arg2="yeah_ more arg1 arg2" arg3=/my/path finalarg'
p = subprocess.Popen(shlex.split(cmd), cwd=directory, stdout=subprocess.PIPE)
(out, err) = p.communicate()
As per comment I added stderr too and that worked!:
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,stderr=subprocess.STDOUT)

Capturing *all* terminal output of a program called from Python

I have a program which can be execute as
./install.sh
This install bunch of stuff and has quite a lot of activity happening on screen..
Now, I am trying to execute it via
p = subprocess.Popen(executable, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
With the hope that all the activity happening on the screen is captured in out (or err). However, content is printed directly to the terminal while the process is running, and not captured into out or err, which are both empty after the process is run.
What could be happening here? How can this content be captured?
In general, what you're doing is already sufficient to channel all output to your variables.
One exception to that is if the program you're running is using /dev/tty to connect directly to its controlling terminal, and emitting output through that terminal rather than through stdout (FD 1) and stderr (FD 2). This is commonly done for security-sensitive IO such as password prompts, but rarely seen otherwise.
As a demonstration that this works, you can copy-and-paste the following into a Python shell exactly as given:
import subprocess
executable = ['/bin/sh', '-c', 'echo stdout; echo stderr >&2']
p = subprocess.Popen(executable, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
print "---"
print "output: ", out
print "stderr: ", err
...by contrast, for a demonstration of the case that doesn't work:
import subprocess
executable = ['/bin/sh', '-c', 'echo uncapturable >/dev/tty']
p = subprocess.Popen(executable, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
print "---"
print "output: ", out
In this case, content is written to the TTY directly, not to stdout or stderr. This content cannot be captured without using a program (such as script or expect) that provides a fake TTY. So, to use script:
import subprocess
executable = ['script', '-q', '/dev/null',
'/bin/sh', '-c', 'echo uncapturable >/dev/tty']
p = subprocess.Popen(executable, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
print "---"
print "output: ", out

popen command not giving required output

I am using the below code to run a git command "git tag -l contains ad0beef66e5890cde6f0961ed03d8bc7e3defc63" ..if I run this command standalone I see the required output..but through the below program,it doesnt work,does anyone have any inputs on what could be wrong?
from subprocess import check_call,Popen,PIPE
revtext = "ad0beef66e5890cde6f0961ed03d8bc7e3defc63"
proc = Popen(['git', 'tag', '-l', '--contains', revtext ],stdout=PIPE ,stderr=PIPE)
(out, error) = proc.communicate()
print "OUT"
print out

Categories

Resources