How to diable logging to console (pig embedded in python) - python

I am currently playing with embedding pig into python, and whenever I run the file it works, however it cloggs up the command line with output like the following:
*sys-package-mgr*: processing new jar, '/usr/lib/hadoop/lib/hue-plugins-2.3.0-cdh4.3.0.jar'
*sys-package-mgr*: processing new jar, '/usr/lib/hadoop/lib/paranamer-2.3.jar'
*sys-package-mgr*: processing new jar, '/usr/lib/hadoop/lib/avro-1.7.4.jar'
*sys-package-mgr*: processing new jar, '/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar'
*sys-package-mgr*: processing new jar, '/usr/lib/hadoop/lib/commons-configuration-1.6.jar'
Command line input:
pig embedded_pig_testing.py -p /home/cloudera/Documents/python_testing_config_files/test002_config.cfg
the parameter passed is a file that contains a bunch of variables I am using in the test.
Is there any way to get the script to not log these actions to the command line?

Logging in Java programs/libraries is usually configured by means of a configuration or .properties file. I'm sure there's one for Pig. Something that might be what you're looking for is http://svn.apache.org/repos/asf/pig/trunk/conf/pig.properties.
EDIT: looks like this is specific to Jython.
I have not been able to determine if it's possible at all to disable this, but unless I could find something cleaner, I'd consider simply redirecting sys.stderr (or sys.stdout) during the .jar loading phase:
import os
import sys
old_stdout, old_stderr = sys.stdout, sys.stderr
sys.stdout = sys.stderr = open(os.devnull, 'w')
do_init() # load your .jar's here
sys.stdout, sys.stderr = old_stdout, old_stderr

This logging comes from jython scanning your Java packages to build a cache for later use: https://wiki.python.org/jython/PackageScanning
So long as your script only uses full class imports (no import x.y.* wildcards), you can disable package scanning via the python.cachedir.skip property:
pig ... -Dpython.cachedir.skip=true ...
Frustratingly, I believe jython writes these messages to stdout instead of stderr, so piping stderr elsewhere won't help you out.
Another option is to use streaming python instead of jython whenever pig 0.12 ships. See PIG-2417 for more details on that.

Related

Save console output in txt file as it happens

I want to save my console output in a text file, but I want it to be as it happens so that if the program crashes, logs will be saved.
Do you have some ideas?
I can't just specify file in logger because I have a lot of different loggers that are printing into the console.
I think that you indeed can use a logger, just adding a file handler, from the logging module you can read this
As an example you can use something like this, which logs both to the terminal and to a file:
import logging
from pathlib import Path
root_path = <YOUR PATH>
log_level = logging.DEBUG
# Print to the terminal
logging.root.setLevel(log_level)
formatter = logging.Formatter("%(asctime)s | %(levelname)s | %(message)s", "%Y-%m-%d %H:%M:%S")
stream = logging.StreamHandler()
stream.setLevel(log_level)
stream.setFormatter(formatter)
log = logging.getLogger("pythonConfig")
if not log.hasHandlers():
log.setLevel(log_level)
log.addHandler(stream)
# file handler:
file_handler = logging.FileHandler(Path(root_path / "process.log"), mode="w")
file_handler.setLevel(log_level)
file_handler.setFormatter(formatter)
log.addHandler(file_handler)
log.info("test")
If you have multiple loggers, you can still use this solutions as loggers can inherit from other just put the handler in the root logger and ensure that the others take the handler from that one.
As an alternative you can use nohup command which will keep the process running even if the terminal closes and will return the outputs to the desired location:
nohup python main.py > log_file.out &
There are literally many ways to do this. However, they are not all suitable for different reasons (maintainability, ease of use, reinvent the wheel, etc.).
If you don't mind using your operating system built-ins you can:
forward standard output and error streams to a file of your choice with python3 -u ./myscript.py 2>&1 outputfile.txt.
forward standard output and error streams to a file of your choice AND display it to the console too with python3 -u ./myscript.py 2>&1 tee outputfile.txt. The -u option specifies the output is unbuffered (i.e.: whats put in the pipe goes immediately out).
If you want to do it from the Python side you can:
use the logging module to output the generated logs to a file handle instead of the standard output.
override the stdout and stderr streams defined in sys (sys.stdout and sys.stderr) so that they point to an opened file handle of your choice. For instance sys.stdout = open("log-stdout.txt", "w").
As a personnal preference, the simpler, the better. The logging module is made for the purpose of logging and provides all the necessary mechanisms to achieve what you want. So.. I would suggest that you stick with it. Here is a link to the logging module documentation which also provides many examples from simple use to more complex and advanced use.

Redirecting output of launchfile called from Python

I have a simple piece of code which runs a specified launchfile:
roslaunch.configure_logging(uuid)
uuid = roslaunch.rlutil.get_or_generate_uuid(None, False)
file = [(roslaunch.rlutil.resolve_launch_arguments(cli)[0], cli[2:])]
launch = roslaunch.parent.ROSLaunchParent(uuid, file)
Execution of the launchfile stuff generates lots of logging output on stdout/err, so the actual script's output is getting lost.
Is it possible to somehow redirect or disable printing it on the screen?
Two options:
via env vars (export).
via python logging module.
Using env vars
You can set ROS logging to a /dev/null like file system.
See ROS env vars: https://wiki.ros.org/ROS/EnvironmentVariables
export ROS_LOG_DIR=/black/hole
Use https://github.com/abbbi/nullfsvfs to create the dir.
Using python logging
import logging
logging.getLogger("rospy").setLevel(logging.CRITICAL)
https://docs.python.org/3/library/logging.html#logging-levels
How about redirecting stdout and stderr to files? As long as you redirect them before your calls to roslaunch it should put all the output into the files you point to (or /dev/null to ignore it).
import sys
sys.stdout = open('redirect.out','w')
sys.stderr = open('redirect.err','w')

How do I embed my shell scanning-script into a Python script?

Iv'e been using the following shell command to read the image off a scanner named scanner_name and save it in a file named file_name
scanimage -d <scanner_name> --resolution=300 --format=tiff --mode=Color 2>&1 > <file_name>
This has worked fine for my purposes.
I'm now trying to embed this in a python script. What I need is to save the scanned image, as before, into a file and also capture any std output (say error messages) to a string
I've tried
scan_result = os.system('scanimage -d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} '.format(scanner, file_name))
But when I run this in a loop (with different scanners), there is an unreasonably long lag between scans and the images aren't saved until the next scan starts (the file is created as an empty file and is not filled until the next scanning command). All this with scan_result=0, i.e. indicating no error
The subprocess method run() has been suggested to me, and I have tried
with open(file_name, 'w') as scanfile:
input_params = '-d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} '.format(scanner, file_name)
scan_result = subprocess.run(["scanimage", input_params], stdout=scanfile, shell=True)
but this saved the image in some kind of an unreadable file format
Any ideas as to what may be going wrong? Or what else I can try that will allow me to both save the file and check the success status?
subprocess.run() is definitely preferred over os.system() but neither of them as such provides support for running multiple jobs in parallel. You will need to use something like Python's multiprocessing library to run several tasks in parallel (or painfully reimplement it yourself on top of the basic subprocess.Popen() API).
You also have a basic misunderstanding about how to run subprocess.run(). You can pass in either a string and shell=True or a list of tokens and shell=False (or no shell keyword at all; False is the default).
with_shell = subprocess.run(
"scanimage -d {} --resolution=300 --format=tiff --mode=Color 2>&1 > {} ".format(
scanner, file_name), shell=True)
with open(file_name) as write_handle:
no_shell = subprocess.run([
"scanimage", "-d", scanner, "--resolution=300", "--format=tiff",
"--mode=Color"], stdout=write_handle)
You'll notice that the latter does not support redirection (because that's a shell feature) but this is reasonably easy to implement in Python. (I took out the redirection of standard error -- you really want error messages to remain on stderr!)
If you have a larger working Python program this should not be awfully hard to integrate with a multiprocessing.Pool(). If this is a small isolated program, I would suggest you peel off the Python layer entirely and go with something like xargs or GNU parallel to run a capped number of parallel subprocesses.
I suspect the issue is you're opening the output file, and then running the subprocess.run() within it. This isn't necessary. The end result is, you're opening the file via Python, then having the command open the file again via the OS, and then closing the file via Python.
JUST run the subprocess, and let the scanimage 2>&1> filename command create the file (just as it would if you ran the scanimage at the command line directly.)
I think subprocess.check_output() is now the preferred method of capturing the output.
I.e.
from subprocess import check_output
# Command must be a list, with all parameters as separate list items
command = ['scanimage',
'-d{}'.format(scanner),
'--resolution=300',
'--format=tiff',
'--mode=Color',
'2>&1>{}'.format(file_name)]
scan_result = check_output(command)
print(scan_result)
However, (with both run and check_output) that shell=True is a big security risk ... especially if the input_params come into the Python script externally. People can pass in unwanted commands, and have them run in the shell with the permissions of the script.
Sometimes, the shell=True is necessary for the OS command to run properly, in which case the best recommendation is to use an actual Python module to interface with the scanner - versus having Python pass an OS command to the OS.

From a Python script, run all Python scripts in a directory and stream output

My file structure looks like this:
runner.py
scripts/
something_a/
main.py
other_file.py
something_b/
main.py
anythingelse.py
something_c/
main.py
...
runner.py should look at all folders in scripts/ in run the main.py located there.
Right now I'm achieving this through subprocess.check_output. It works, but some of these scripts take a long time to run and I don't get to see any progress; it prints everything after the process has finished.
I'm hoping to find a solution that allows for 2 things to be done somewhat easily:
1) Stream the output instead of getting it all at the end
2) Doesn't
prohibit running multiple scripts at once
Is this possible? A lot of the solutions I've seen for running a Python script from another require knowledge of the other script's name/location. I can also enforce that all the main.py's have a specific function if that helps.
You could use Popen to loop through each file and write its content to multiple log files. Then, you could read from these files in real-time, while each one is populated. :)
How you would want to translate the output to a more readable format, is a little bit more tricky because of readability. You could create another script which reads these log files, with Popen, and decide on how you'd like this information read back in a understandable manner.
""" Use the same command as you would do for check_output """
cmd = ''
for filename in scriptList:
log = filename + ".log"
with io.open(filename, mode=log) as out:
subprocess.Popen(cmd, stdout=out, stderr=out)

reset python interpreter for logging

I am new to Python. I'm using Vim with Python-mode to edit and test my code and noticed that if I run the same code more than once, the logging file will only be updated the first time the code is run. For example below is a piece of code called "testlogging.py"
#!/usr/bin/python2.7
import os.path
import logging
filename_logging = os.path.join(os.path.dirname(__file__), "testlogging.log")
logging.basicConfig(filename=filename_logging, filemode='w',
level=logging.DEBUG)
logging.info('Aye')
If I open a new gvim session and run this code with python-mode, then I would get a file called "testlogging.log" with the content
INFO:root:Aye
Looks promising! But if I delete the log file and run the code in pythong-mode again, then the log file won't be re-created. If at this time I run the code in a terminal like this
./testlogging.py
Then the log file would be generated again!
I checked with Python documentation and noticed this line in the logging tutorial (https://docs.python.org/2.7/howto/logging.html#logging-to-a-file):
A very common situation is that of recording logging events in a file, so let’s look at that next. Be sure to try the following in a newly-started Python interpreter, and don’t just continue from the session described above:...
So I guess this problem with logging file only updated once has something to do with python-mode remaining in the same interpreter when I run a code a second time. So my question is: is there anyway to solve this problem, by fiddling with the logging module, by putting something in the code to reset the interpreter or by telling Python-mode to reset it?
I am also curious why the logging module requires a newly-started interpreter to work...
Thanks for your help in advance.
The log file is not recreated because the logging module still has the old log file open and will continue to write to it (even though you've deleted it). The solution is to force the logging module to release all acquired resources before running your code again in the same interpreter:
# configure logging
# log a message
# delete log file
logging.shutdown()
# configure logging
# log a message (log file will be recreated)
In other words, call logging.shutdown() at the end of your code and then you can re-run it within the same interpreter and it will work as expected (re-creating the log file on every run).
You opened log file with "w" mode. "w" means write data from begining, so you saw only log of last execution.
This is why, you see same contents on log file.
You should correct your code on fifth to seventh line to follows.
filename_logging = os.path.join(os.path.dirname(__file__), "testlogging.log")
logging.basicConfig(filename=filename_logging, filemode='a',
level=logging.DEBUG)
Code above use "a" mode, i.e. append mode, so new log data will be added at the end of log file.

Categories

Resources