Best way to specify path to binary for a wrapper module

Best way to specify path to binary for a wrapper module - python

I write a module that wraps functionality of an external binary.
For example, I wrap ls program into a python module my_wrapper.py
import my_wrapper
print my_wrapper.ls('some_directory/')
# list files in some_directory
and in my_wrapper.py I do:
# my_wrapper.py
PATH_TO_LS = '/bin/ls'
def ls(path):
proc = subprocess.Popen([PATH_TO_LS, path], ...)
...
return paths
(of course, I do not wrap ls but some other binary)
The binary might be installed with an arbitrary location, like /usr/bin/, /opt/ or even at the same place as the python script (./binaries/)
Question:
What would be the cleanest (from the user perspective) way to set the path to the binary?
Should the user specify my_wrapper.PATH_TO_LS = ... or invoke some my_wrapper.set_binary_path(path) at the beginning of his script?
Maybe it would be better to specify it in env, and the wrapper would find it with os.environ?
If the wrapper is distributed as egg, can I require during the installation, that the executable is already present in the system (see below)?
egg example:
# setup.py
setup(
name='my_wrapper',
requires_binaries=['the_binary'] # <--- require that the binary is already
# installed and on visible
# on execution path
)
or
easy_install my_wrapper BINARY_PATH=/usr/local/bin/the_binary

Create a "configuration object" with sane defaults. Allow the consumer to modify the values as appropriate. Accept a configuration object instance to your functions, taking the one you created by default.

Related

Methods to avoid hard-coding file paths in Python

Working with scientific data, specifically climate data, I am constantly hard-coding paths to data directories in my Python code. Even if I were to write the most extensible code in the world, the hard-coded file paths prevent it from ever being truly portable. I also feel like having information about the file system of your machine coded in your programs could be security issue.
What solutions are out there for handling the configuration of paths in Python to avoid having to code them out explicitly?

One of the solution rely on using configuration files.
You can store all your path in a json file like so :
{
"base_path" : "/home/bob/base_folder",
"low_temp_area_path" : "/home/bob/base/folder/low_temp"
}
and then in your python code, you could just do :
import json
with open("conf.json") as json_conf :
CONF = json.load(json_conf)
and then you can use your path (or any configuration variable you like) like so :
print "The base path is {}".format(CONF["base_path"])

First off its always good practise to add a main function to go with each class to test that class or functions in the file. Along with this you determine the current working directory. This becomes incredibly important when running python from a cron job or from a directory that is not the current working directory. No JSON files or environment variables are then needed and you will obtain interoperation across Mac, RHEL and Debian distributions.
This is how you do it, and it will work on windows also if you use '\' instead of '/' (if that is even necessary, in your case).
if "__main__" == __name__:
workingDirectory = os.path.realpath(sys.argv[0])
As you can see when you run your command, the working directory is calculated if you provide a full path or relative path, meaning it will work in a cron job automatically.
After that if you want to work with data that is stored in the current directory use:
fileName = os.path.join( workingDirectory, './sub-folder-of-current-directory/filename.csv' )
fp = open( fileName,'r')
or in the case of the above working directory (parallel to your project directory):
fileName = os.path.join( workingDirectory, '../folder-at-same-level-as-my-project/filename.csv' )
fp = open( fileName,'r')

I believe there are many ways around this, but here is what I would do:
Create a JSON config file with all the paths I need defined.
For even more portability, I'd have a default path where I look for this config file but also have a command line input to change it.

In my opinion passing arguments from command line would be best solution. You should take a look at argparse . This allows you to create nice way to handle arguments from the command line. for example:
myDataScript.py /home/userName/datasource1/

Python temporary directory to execute other processes?

I have a string of Java source code in Python that I want to compile, execute, and collect the output (stdout and stderr). Unfortunately, as far as I can tell, javac and java require real files, so I have to create a temporary directory.
What is the best way to do this? The tempfile module seems to be oriented towards creating files and directories that are only visible to the Python process. But in this case, I need Java to be able to see them too. However, I also want the other stuff to be handled intelligently if possible (such as deleting the folder when done or using the appropriate system temp folder)

tempfile.NamedTemporaryFile and tempfile.TemporaryDirectory work perfectly fine for your purposes. The resulting objects have a .name attribute that provides a file system visible name that java/javac can handle just fine, just make sure to:
Set the suffix appropriately if the compiler insists on files being named with a .java extension
Always call .flush() on the file handle before handing the .name of a NamedTemporaryFile to an external process or it may (usually will) see an incomplete file
If you don't want Python cleaning up the files when you close the objects, either pass delete=False to NamedTemporaryFile's constructor, or use the mkstemp and mkdtemp functions (which create the objects, but don't clean them up for you).
So for example, you might do:
# Create temporary directory for source and class files
with tempfile.TemporaryDirectory() as d:
# Write source code
srcpath = os.path.join(d.name, "myclass.java")
with open(srcpath, "w") as srcfile:
srcfile.write('source code goes here')
# Compile source code
subprocess.check_call(['javac', srcpath])
# Run source code
# Been a while since I've java-ed; you don't include .java or .class
# when running, right?
invokename = os.path.splitext(srcpath)[0]
subprocess.check_call(['java', invokename])
... with block for TemporaryDirectory done, temp directory cleaned up ...

tempfile.mkstemp creates a file that is normally visible in the filesystem and returns you the path as well. You should be able to use this to create your input and output files - assuming javac will atomically overwrite the output file if it exists there should be no race condition if other processes on your system don't misbehave.

How to use PyCharm for GIMP plugin development?

I need to develop a plugin for GIMP and would like to stay with PyCharm for Python editing, etc.
FYI, I'm on Windows.
After directing PyCharm to use the Python interpreter included with GIMP:
I also added a path to gimpfu.py to get rid of the error on from gimpfu import *:
This fixes the error on the import, even when set to Excluded.
I experimented with setting this directory to Sources, Resources and Excluded and still get errors for constants such as RGBA-IMAGE, TRANSPARENT_FILL, NORMAL_MODE, etc.
Any idea on how to contort PyCharm into playing nice for GIMP plugin development?
Not really running any code from PyCharm, it's really just being used as a nice code editor, facilitate revisions control, etc.

As you find this variables are part of .pyd files (dll files for Python). PyCharm can't get signatures for content of this files.
For Python builtins (like abs, all, any, etc.) PyCharm has it's own .py files that uses only for signatures and docs. You can see it if you'll click on some of this funcs and go to it's declaration:
PyCharm will open builtins.py file in it's folder with following content:
def abs(*args, **kwargs): # real signature unknown
""" Return the absolute value of the argument. """
pass
def all(*args, **kwargs): # real signature unknown
"""
Return True if bool(x) is True for all values x in the iterable.
If the iterable is empty, return True.
"""
pass
def any(*args, **kwargs): # real signature unknown
"""
Return True if bool(x) is True for any x in the iterable.
If the iterable is empty, return False.
"""
pass
As you see functions are defined and documented, but have no implementation, because their implementation created with C and placed somewhere in binary file.
Pycharm can't provide such wrapper for every library. Usually people who created .pyd files provide their .py wrappers (for example, PyQt module: no native python implementation, just signatures).
Looks like Gimp doesn't have such wrapper for some of variables. Only way I see is to create some sort of own wrapper manually. For example, create gimpfu_signatures.py with following content:
RGBA_IMAGE = 1
TRANSPARENT_FILL = 2
NORMAL_MODE = 3
And import it while you're creating plugin:
from gimpfu import *
from gimpfu_signatures import * # comment on release
Not elegant, but better then nothing.
...
One more note about gimpfu.py's path. If I understand correctly, you just added this path to project. It may work, but correct way is to add it to project's PYTHONPATH (in project preferences). See this link for detailed manual.

Pythonic way to set module-wide settings from external file

Some background (not mandatory, but might be nice to know): I am writing a Python command-line module which is a wrapper around latexdiff. It basically replaces all \cite{ref1, ref2, ...} commands in LaTeX files with written-out and properly formatted references before passing the files to latexdiff, so that latexdiff will properly mark changes to references in the text (otherwise, it treats the whole \cite{...} command as a single "word"). All the code is currently in a single file which can be run with python -m latexdiff-cite, and I have not yet decided how to package or distribute it. To make the script useful for anybody else, the citation formatting needs to be configurable. I have implemented an optional command-line argument -c CONFIGFILE to allow the user to point to their own JSON config file (a default file resides in the module folder and is loaded if the argument is not used).
Current implementation: My single-file command-line Python module currently parses command-line arguments in if __name__ == '__main__', and loads the config file (specified by the user in -c CONFIGFILE) here before running the main function of the program. The config variable is thus available in the entire module and all is well. However, I'm considering publishing to PyPI by following this guide which seems to require me to put the command-line parsing in a main() function, which means the config variable will not be available to the other functions unless passed down as arguments to where it's needed. This "passing down by arguments" method seems a little cluttered to me.
Question: Is there a more pythonic way to set some configuration globals in a module or otherwise accomplish what I'm trying to? (I don't want to rely on 3rd party modules.) Am I perhaps completely off the tracks in some fundamental way?

One way to do it is to have the configurations defined in a class or a simple dict:
class Config(object):
setting1 = "default_value"
setting2 = "default_value"
#staticmethod
def load_config(json_file):
""" load settings from config file """
with open(json_file) as f:
config = json.load(f)
for k, v in config.iteritems():
setattr(Config, k, v)
Then your application can access the settings via this class: Config.setting1 ...

SCons: How to generate dependencies after some targets have been built?

I have a SCons project that builds set of Python modules (mostly as shared objects compiled from C++).
This part works flawlessly, all dependencies seem to be fine.
However, now I reached point where I started trying to add tests for those modules. Those tests are supposed to be run as part of build. Tests are written in Python and when run they need to be run under environment that already has all modules built (so that import statements can find them).
I got that part working up to the point of dependencies for those tests. Tests are run, however I can't seem to find a way to generate dependencies for them from import statements.
I have found modulefinder which does exactly what I want. What's more I am able to run it in built environment and get expected results after my project is built.
I wanted to use modulefinder in emitter to scan for files the test/Python script depends on.
Problem is that dependency scanning/building + running of emitters happens before SCons has built all modules and before it can set up environment properly for tests, hence also modulefinder can't work.
I can't seem to work out how to make SCons look for dependencies for specific targets after some other targets are already built.
edit:
I found ParseDepends in SCons documentation which seems to talk about the same type of issue (well, almost exactly same with exception to langauge).
This limitation of ParseDepends leads to unnecessary recompilations. Therefore, ParseDepends should only be used if scanners are not available for the employed language or not powerful enough for the specific task.
I am still hopeful though that there is some clean solution to my problem.

You can't change dependencies in SCons during the compilation phase. SCons creates its dependency tree and after it runs it. You can't change it.
I suggest you to write a Scanner for your builder. In C++, SCons uses a Scanner to find include dependencies.
[http://www.scons.org/doc/1.1.0/HTML/scons-user/c3742.html][1]

After a lot of playing I found, not-so-clean-but-not-too-horrible-way-that-seems-to-work which I've wrapped in a helper Scanner-derived class:
from SCons.Node import Node
from SCons import Scanner
import logging
_LOGGER = logging.getLogger(__name__)
class DeferredScanner(Scanner.Current):
"""
This is a helper class for implementing source scanners that need
to wait for specific things to be built before source scanning can happen.
One practical example of usage is when you are you generating Python
modules (i.e. shared libraries) which you want to test.
You have to wait for all your modules are ready before dependencies
of your tests can be scanned.
To do this approach with this scanner is to collect all generated modules
and `wait_for` them before scanning dependncies of whatever this scanner
is used for.
Sample usage:
py_scanner = DeferredScanner(
wait_for = all_generated_modules,
function = _python_source_scanner,
recursive = True,
skeys = ['.py'],
path_function = FindENVPathDirs('PYTHONPATH'),
)
"""
def __init__(self, wait_for, **kw):
Scanner.Current.__init__(
self,
node_factory = Node,
**kw
)
self.wait_for = wait_for
self.sources_to_rescan = []
self.ready = False
env = wait_for[0].get_build_env()
env.AlwaysBuild(wait_for)
self._hook_implicit_reset(
wait_for[0],
'built',
self.sources_to_rescan,
lambda: setattr(self, 'ready', True),
)
def _hook_implicit_reset(self, node, name, on, extra = None):
# We can only modify dependencies in main thread
# of Taskmaster processing.
# However there seems to be no hook to "sign up" for
# callback when post processing of built node is hapenning
# (at least I couldn't find anything like this in SCons
# 2.2.0).
# AddPostAction executes actions in Executor thread, not
# main one, se we can't used that.
#
# `built` is guaranteed to be executed in that thread,
# so we hijack this one.
#
node.built = lambda old=node.built: (
old(),
self._reset_stored_dependencies(name, on),
extra and extra(),
)[0]
def _reset_stored_dependencies(self, name, on):
_LOGGER.debug('Resetting dependencies for {0}-{1} files: {2}'.format(
# oh, but it does have those
self.skeys, # pylint: disable=no-member
name,
len(on),
))
for s in on:
# I can't find any official way to invalidate
# FS.File (or Node in general) list of implicit dependencies
# that were found.
# Yet we want to force that Node to scan its implicit
# dependencies again.
# This seems to do the trick.
s._memo.pop('get_found_includes', None) # pylint: disable=protected-access
def __call__(self, node, env, path = ()):
ready = self.ready
_LOGGER.debug('Attempt to scan {0} {1}'.format(str(node), ready))
deps = self.wait_for + [node]
if ready:
deps.extend(Scanner.Current.__call__(self, node, env, path))
self.sources_to_rescan.append(node)
# In case `wait_for` is not dependent on this node
# we have to make sure we will rescan dependencies when
# this node is built itself.
# It boggles my mind that SCons scanns nodes before
# they exist, and caches result even if there was no
# list returned.
self._hook_implicit_reset(
node,
'self',
[node],
)
return deps
This seems to work exactly as I'd have hoped and do the job.
It is probably as efficient as you can get.
Probably should also note that this works with SCons 2.2.0, but I suspect it shouldn't be much different for newer ones.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Best way to specify path to binary for a wrapper module - python

Create a "configuration object" with sane defaults. Allow the consumer to modify the values as appropriate. Accept a configuration object instance to your functions, taking the one you created by default.

Related

Methods to avoid hard-coding file paths in Python

Python temporary directory to execute other processes?

How to use PyCharm for GIMP plugin development?

Pythonic way to set module-wide settings from external file

SCons: How to generate dependencies after some targets have been built?

Categories

Resources