Python. Redirect stderr to log file - python

I have Django web-site working on tornado and nginx.
I took this tornado launcher script (tornading.py)
Then I'm using python openid that outputs some information to sys.stderr.
As a result I get IOError.
How can I redirect it using logging package?
I thought about
f = open("myfile.log", "w")
sys.stderr = f
or
python tornado.py > /dev/null 2>&1
But what is the best way to solve it?

The best way would be if the openid library didn't print to stderr, but used some kind of logging API instead (e.g. the logging module). I agree with thkala that modifying third-party code is not good in the long term, so you should fix it, and then provide the fix to the openid authors.
For the objective of advancing the open source community, that's the best way to solve it.

Using shell redirections is more of a work-around than a solution and it may not be always possible, depending on how the script is launched.
It has the distinct advantage, however, of you not having to modify third-party code. Local modifications - even minor ones - can become a major issue when you decide to e.g. update said code to its latest version from upstream.

Related

Prevent any file system usage in Python's pytest

I have a program that, for data security reasons, should never persist anything to local storage if deployed in the cloud. Instead, any input / output needs to be written to the connected (encrypted) storage instead.
To allow deployment locally as well as to multiple clouds, I am using the very useful fsspec. However, other developers are working on the project as well, and I need a way to make sure that they aren't accidentally using local File I/O methods - which may pass unit tests, but fail when deployed to the cloud.
For this, my idea is to basically mock/replace any I/O methods in pytest with ones that don't work and make the test fail. However, this is probably not straightforward to implement. I am wondering whether anyone else has had this problem as well, and maybe best practices / a library exists for this already?
During my research, I found pyfakefs, which looks like it is very close what I am trying to do - except I don't want to simulate another file system, I want there to be no local file system at all.
Any input appreciated.
You can not use any pytest addons to make it secure. There will always be ways to overcome it. Even if you patch everything in the standard python library, the code always can use third-party C libraries which can't be patched from the Python side.
Even if you, by some way, restrict every way the python process can write the file, it will still be able to call the OS or other process to write something.
The only ways are to run only the trusted code or to use some sandbox to run the process.
In Unix-like operating systems, the workable solution may be to create a chroot and run the program inside it.
If you're ok with just preventing opening files using open function, you can patch this function in builtins module.
_original_open = builtins.open
class FileSystemUsageError(Exception):
pass
def patched_open(*args, **kwargs):
raise FileSystemUsageError()
#pytest.fixture
def disable_fs():
builtins.open = patched_open
yield
builtins.open = _original_open
I've done this example of code on the basis of the pytest plugin which is written by the company in which I work now to prevent using network in pytests. You can see a full example here: https://github.com/best-doctor/pytest_network/blob/4e98d816fb93bcbdac4593710ff9b2d38d16134d/pytest_network.py

Is it possible to import a module in python without using "import" or "eval"? [duplicate]

I understand that letting any anonymous user upload any sort of file in general can be dangerous, especially if it's code. However, I have an idea to let users upload custom AI scripts to my website. I would provide the template so that the user could compete with other AI's in an online web game I wrote in Python. I either need a solution to ensure a user couldn't compromise any other files or inject malicious code via their uploaded script or a solution for client-side execution of the game. Any suggestions? (I'm looking for a solution that will work with my Python scripts)
I am in no way associated with this site and I'm only linking it because it tries to achieve what you are getting after: jailing of python. The site is code pad.
According to the about page it is ran under geordi and traps all sys calls with ptrace. In addition to be chroot'ed they are on a virtual machine with firewalls in place to disallow outbound connections.
Consider it a starting point but I do have to chime in on the whole danger thing. Gotta CYA myself. :)
Using PyPy you can create a python sandbox. The sandbox is a separate and supposedly secure python environment where you can execute their scripts. More info here
http://codespeak.net/pypy/dist/pypy/doc/sandbox.html
"In theory it's impossible to do anything bad or read a random file on the machine from this prompt."
"This is safe to do even if script.py comes from some random untrusted source, e.g. if it is done by an HTTP server."
Along with other safeguards, you can also incorporate human review of the code. Assuming part of the experience is reviewing other members' solutions, and everyone is a python developer, don't allow new code to be activated until a certain number of members vote for it. Your users aren't going to approve malicious code.
Yes.
Allow them to script their client, not your server.
PyPy is probably a decent bet on the server side as suggested, but I'd look into having your python backend provide well defined APIs and data formats and have the users implement the AI and logic in Javascript so it can run in their browser. So the interaction would look like: For each match/turn/etc, pass data to the browser in a well defined format, provide a javascript template that receives the data and can implement logic, and provide web APIs that can be invoked by the client (browser) to take the desired actions. That way you don't have to worry about security or server power.
Have an extensive API for the users and strip all other calls upon upload (such as import statements). Also, strip everything that has anything to do with file i/o.
(You might want to do multiple passes to ensure that you didn't miss anything.)

What is the best way of running shell commands from a web based interface?

Imagine a web application that allows a logged in user to run a shell command on the web server at the press of a button. This is relatively simple in most languages via some standard library os tools.
But if that command is long running you don't want your UI to hang. Again this is relatively easy to deal with using some sort of background process or putting the command to be executed onto a message queue (and maybe saving the output and status somewhere for later consumption). Just return quickly saving we'll run that and get back to you.
What I'd like to do is show the output of said web ui triggered shell command as it happens. So vertically scrolling text like when running in a terminal.
I have a vague idea of how I might approach this, streaming the output to a websocket perhaps and simply printing the output to screen.
What I'd like to ask is:
Are their any plugins, libraries or applications that already do this. Something I can either use or read the source of. Ideally an open source python/django or ruby/rails tool, but other stacks would be interesting too.
I'm not sure if it's what you want, but there are some web based ssh clients out there. If you care about security and really just want dynamic feedback, you could look into comet or just have a frame with its own http session that doesn't end until it's done printing.
web-based ssh client would work, into the host (there are java ssh clients out there).
Ruby has a web-based terminal:
http://tryruby.org (link to the source is at the bottom of the page).
You could also embed Ruby via jruby: http://tim.lossen.de/2007/03/jruby/applet.html
http://github.com/jruby/jruby/blob/master/samples/irb-applet.html
I haven't heard of any libraries that do this, but you'll need to setup the system command and call out to the system. You will then need to "pump" the sysout and syserr standard inputs and pipe that data back out to your web client.
As an example for this style of problem, look into code snippits of how people use ruby/python/etc to transcode a video, i.e. http://kpumuk.info/ruby-on-rails/encoding-media-files-in-ruby-using-ffmpeg-mencoder-with-progress-tracking/ - my example was taken from this blog post.
class MediaFormatException < StandardError
end
def execute_mencoder(command)
progress = nil
IO.popen(command) do |pipe|
pipe.each("r") do |line|
if line =~ /Pos:[^(]*(s*(d+)%)/
p = $1.to_i
p = 100 if p > 100
if progress != p
progress = p
print "PROGRESS: #{progress}n"
$defout.flush
end
end
end
end
raise MediaFormatException if $?.exitstatus != 0
end
I don't know if this example is pulling data from both sysout and syserr, but you will definitely need to be pulling data from both of those interfaces, typically if the buffer fills up, the executing command might hang or fail (I have experienced this with Python). This method will also look different if the only thing you do is return line to the web client - in a terminal, the progress indicator of ffmpeg/mencoder remains stationary on the bottom line, but this method will give you a long list of progress indicator updates. Pipe line out to your terminal and you'll see what I'm referring to.
So, I've tried to answer my own question with code as I couldn't find anything to quite fit the bill. Hopefully it's useful to anyone coming across the same problem.
Redbeard 0X0A pointed me in the general direction, I was able to get a stand along ruby script doing what I wanted using popen. Extending this to using EventMachine (as it provided a convenient way of writing a websocket server) and using it's inbuilt popen method solved my problem.
More details here http://morethanseven.net/2010/09/09/Script-running-web-interface-with-websockets.html and the code at http://github.com/garethr/bolt/
Certainly not the best way to run shell commands, but likely the easiest:
#!/bin/sh
echo Content-Type: text/plain
echo
/usr/bin/uptime
http://www.sente.cc/scripts/uptime.cgi
Take a look at Galaxy (online demo) or Yabi.
Except from the requirement to be able to show output during the job run, they are both excellent solutions to this! They are also both written i Python (and Yabi even on django).
They were both built with bioinformatics in mind, but really are both general job runner/workflow tools.
They will let you specify parameters in a web interface, see queued/running/finished jobs in a separate column, and after the jobs are finished, inspect details and results, or re-run the job, with possibly changed parameters.
Galaxy is the easier one to install. The Galaxy installation boils down to downloading and run "sh run.sh"), and adding your own tool boils down to creating an XML file in the line of:
<tool id="mytool" name="My Tool" version="1.0.0">
<description>Does this and that</description>
<command>somecommand --aparam $aparam</command>
<inputs>
<param name="aparam" type="text" label="A parameter"/>
</inputs>
<outputs>
<data name="outfile" format="tabular"/>
</outputs>
</tool>
... and place it in the /tools folder, and add a line in the tool_conf.xml to tell galaxy of your new tool (There you can also get rid of the bioinformatics-tools, so they don't mess up your tools menu).
Yabi is more complicated to install (see the readme file), but the process might be smooth if you are on the right kind of system. On the other hand, it allows you even do the tool configuration in the web interface, rather than as an XML file like in Galaxy.
Galaxy still is the one with the biggest community though, which is reflected in the number of features/already integrated tools (See the toolshed for shared tools/wrapper).
websocketd looks like the perfect tool for that.

Mercurial scripting with python

I am trying to get the mercurial revision number/id (it's a hash not a number) programmatically in python.
The reason is that I want to add it to the css/js files on our website like so:
<link rel="stylesheet" href="example.css?{% mercurial_revision "example.css" %}" />
So that whenever a change is made to the stylesheet, it will get a new url and no longer use the old cached version.
OR if you know where to find good documentation for the mercurial python module, that would also be helpful. I can't seem to find it anywhere.
My Solution
I ended up using subprocess to just run a command that gets the hg node. I chose this solution because the api is not guaranteed to stay the same, but the bash interface probably will:
import subprocess
def get_hg_rev(file_path):
pipe = subprocess.Popen(
["hg", "log", "-l", "1", "--template", "{node}", file_path],
stdout=subprocess.PIPE
)
return pipe.stdout.read()
example use:
> path_to_file = "/home/jim/workspace/lgr/pinax/projects/lgr/site_media/base.css"
> get_hg_rev(path_to_file)
'0ed525cf38a7b7f4f1321763d964a39327db97c4'
It's true there's no official API, but you can get an idea about best practices by reading other extensions, particularly those bundled with hg. For this particular problem, I would do something like this:
from mercurial import ui, hg
from mercurial.node import hex
repo = hg.repository('/path/to/repo/root', ui.ui())
fctx = repo.filectx('/path/to/file', 'tip')
hexnode = hex(fctx.node())
Update At some point the parameter order changed, now it's like this:
repo = hg.repository(ui.ui(), '/path/to/repo/root' )
Do you mean this documentation?
Note that, as stated in that page, there is no official API, because they still reserve the right to change it at any time. But you can see the list of changes in the last few versions, it is not very extensive.
An updated, cleaner subprocess version (uses .check_output(), added in Python 2.7/3.1) that I use in my Django settings file for a crude end-to-end deployment check (I dump it into an HTML comment):
import subprocess
HG_REV = subprocess.check_output(['hg', 'id', '--id']).strip()
You could wrap it in a try if you don't want some strange hiccup to prevent startup:
try:
HG_REV = subprocess.check_output(['hg', 'id', '--id']).strip()
except OSError:
HG_REV = "? (Couldn't find hg)"
except subprocess.CalledProcessError as e:
HG_REV = "? (Error {})".format(e.returncode)
except Exception: # don't use bare 'except', mis-catches Ctrl-C and more
# should never have to deal with a hangup
HG_REV = "???"
If you are using Python 2, you want to use hglib.
I don't know what to use if you're using Python 3, sorry. Probably hgapi.
Contents of this answer
Mercurial's APIs
How to use hglib
Why hglib is the best choice for Python 2 users
If you're writing a hook, that discouraged internal interface is awfully convenient
Mercurial's APIs
Mercurial has two official APIs.
The Mercurial command server. You can talk to it from Python 2 using the hglib (wiki, PyPI) package, which is maintained by the Mercurial team.
Mercurial's command-line interface. You can talk to it via subprocess, or hgapi, or somesuch.
How to use hglib
Installation:
pip install python-hglib
Usage:
import hglib
client = hglib.open("/path/to/repo")
commit = client.log("tip")
print commit.author
More usage information on the hglib wiki page.
Why hglib is the best choice for Python 2 users
Because it is maintained by the Mercurial team, and it is what the Mercurial team recommend for interfacing with Mercurial.
From Mercurial's wiki, the following statement on interfacing with Mercurial:
For the vast majority of third party code, the best approach is to use Mercurial's published, documented, and stable API: the command line interface. Alternately, use the CommandServer or the libraries which are based on it to get a fast, stable, language-neutral interface.
From the command server page:
[The command server allows] third-party applications and libraries to communicate with Mercurial over a pipe that eliminates the per-command start-up overhead. Libraries can then encapsulate the command generation and parsing to present a language-appropriate API to these commands.
The Python interface to the Mercurial command-server, as said, is hglib.
The per-command overhead of the command line interface is no joke, by the way. I once built a very small test suite (only about 5 tests) that used hg via subprocess to create, commit by commit, a handful of repos with e.g. merge situations. Throughout the project, the runtime of suite stayed between 5 to 30 seconds, with nearly all time spent in the hg calls.
If you're writing a hook, that discouraged internal interface is awfully convenient
The signature of a Python hook function is like so:
# In the hgrc:
# [hooks]
# preupdate.my_hook = python:/path/to/file.py:my_hook
def my_hook(
ui, repo, hooktype,
... hook-specific args, find them in `hg help config` ...,
**kwargs)
ui and repo are part of the aforementioned discouraged unofficial internal API. The fact that they are right there in your function args makes them terribly convenient to use, such as in this example of a preupdate hook that disallows merges between certain branches.
def check_if_merge_is_allowed(ui, repo, hooktype, parent1, parent2, **kwargs):
from_ = repo[parent2].branch()
to_ = repo[parent1].branch()
...
# return True if the hook fails and the merge should not proceed.
If your hook code is not so important, and you're not publishing it, you might choose to use the discouraged unofficial internal API. If your hook is part of an extension that you're publishing, better use hglib.
give a try to the keyword extension
FWIW to avoid fetching that value on every page/view render, I just have my deploy put it into the settings.py file. Then I can reference settings.REVISION without all the overhead of accessing mercurial and/or another process. Do you ever have this value change w/o reloading your server?
I wanted to do the same thing the OP wanted to do, get hg id -i from a script (get tip revision of the whole REPOSITORY, not of a single FILE in that repo) but I did not want to use popen, and the code from brendan got me started, but wasn't what I wanted.
So I wrote this... Comments/criticism welcome. This gets the tip rev in hex as a string.
from mercurial import ui, hg, revlog
# from mercurial.node import hex # should I have used this?
def getrepohex(reporoot):
repo = hg.repository(ui.ui(), reporoot)
revs = repo.revs('tip')
if len(revs)==1:
return str(repo.changectx(revs[0]))
else:
raise Exception("Internal failure in getrepohex")

How do I write to a log from mod_python under apache?

I seem to only be able to write to the Apache error log via stderr. Anyone know of a more structured logging architecture that I could use from my python web project, like commons?
This must have changed in the past four years. If you come across this question and want to do this then you can do it through the request object, i.e
def handler(req) :
req.log_error('Hello apache')
There isn't any built in support for mod_python logging to Apache currently. If you really want to work within the Apache logs you can check out this thread (make sure you get the second version of the posted code, rather than the first):
http://www.dojoforum.com/node/13239
http://www.modpython.org/pipermail/mod_python/2005-October/019295.html
If you're just looking to use a more structured logging system, the Python standard logging module referred to by Blair is very feature complete. Aside from the Python.org docs Blair linked, here's a more in-depth look at the module's features from onLamp:
http://www.onlamp.com/pub/a/python/2005/06/02/logging.html
And for a quickie example usage:
http://hackmap.blogspot.com/2007/06/note-to-self-using-python-logging.html
I've used the builtin Python logging module in (non-web) projects in the past, with success - it should work in a web-hosted environment as well.
I concur with Blair Conrad's post about the Python logging module. The standard log handlers sometimes drop messages however. It's worth using the logging module's SocketHandler and building a receiver to listen for messages and write them to file.
Here's mine: Example SocketHandler receiver.

Categories

Resources