python: OpenCV Root Directory - python

I am using OpenCV for various object detectors, and I am finding it difficult to write portable code.
For instance, to load a face detector, on a mac with OpenCV installed via homebrew, I have to write:
haar=cv.Load('/usr/local/Cellar/opencv/2.4.2/share/OpenCV/haarcascades/haarcascade_frontalface_default.xml')
This is not portable; if I wish to change to another machine I'll have to determine another absolute path and change this code.
Is there a variable that holds the OpenCV root for OpenCV? That way I could write something like:
haar=cv.Load(os.path.join(OpenCVRoot, "haarcascades",
"haarcascade_frontalface_default.xml"))
UPDATE: It looks like this is not just a problem for me; it is also a problem for the OpenCV documentation. The documentation contains the following broken example code:
>>> import cv
>>> image = cv.LoadImageM("lena.jpg", cv.CV_LOAD_IMAGE_GRAYSCALE)
>>> cascade = cv.Load("../../data/haarcascades/haarcascade_frontalface_alt.xml")
>>> print cv.HaarDetectObjects(image, cascade, cv.CreateMemStorage(0), 1.2, 2, 0, (20, 20))
[((217, 203, 169, 169), 24)]
This would be simple to avoid if there was a way to infer where examples like lena.jpg and the pre-trained classifiers were installed.
Source: http://opencv.willowgarage.com/documentation/python/objdetect_cascade_classification.html (Retrieved 3/5/13)

You can use cv2.__file__ to get path to the module and then use os.path to resolve symlinks and do some path manipulation. This line of code returns the directory of the haarcascades files on my Mac OS homebrew installation. It may work on other installations too.
from os.path import realpath, normpath
normpath(realpath(cv2.__file__) + '../../../../../share/OpenCV/haarcascades')

It seems there is little hope of having a single mechanism which would be portable across time and space (versions and platforms/environments), but there is some progress - I don't know which version introduced it, but 4.0 has it:
cv2.data.haarcascades - string pointing to a directory, e.g.:
>>> import cv2
>>> cv2.data
<module 'cv2.data' from 'C:\\Users\\USERNAME\\Anaconda3\\envs\\py36\\lib\\site-packages\\cv2\\data\\__init__.py'>
>>> cv2.data.haarcascades
'C:\\Users\\USERNAME\\Anaconda3\\envs\\py36\\lib\\site-packages\\cv2\\data\\'
>>> cv2.__version__
'4.0.0'
But unfortunately, for 3.2.x and 3.4.x there is no such module...
So you could do:
if hasattr(cv2, 'data'):
print('Cascades are here:', cv2.data.haarcascades)
else:
print('This may not work:')
print(normpath(realpath(cv2.__file__) + '../../../../../share/OpenCV/haarcascades'))

check this link
You can use sys.platform to determine the platform and set a different default path depending on the return from sys.platform

I'm in the same boat, and it is an annoying boat.
One convention you can use to deal with such portability issues is use configuration files. In your case, you could have a file ~/.myprojectrc, which could contain:
[cv]
cvroot = '/usr/local/Cellar/opencv/2.4.2/share/OpenCV/`
Files in this format can be read by ConfigParser objects, which should go something like:
import ConfigParser
import os
CONFIG = ConfigParser.ConfigParser()
config_path = os.path.join(os.path.expanduser('~'), '.myprojectrc')
if not os.path.exists(config_path):
raise Exception('You need to make a "~/.myprojectrc" file with your cv path or something')
CONFIG.read(config_path)
...
cv_root = CONFIG.get('cv', 'cvroot')
Then at least when someone uses the code on another machine they don't have to modify any code, they just need to create the config file with the opencv path, and they get a clear error message telling them to do so.

Why don't just copy that folder containing the xml files to your local workspace folder and use the relative path? As I remember, it does not cost much space on your hard drive

Related

How to pass any directory format (windows, linux, mac) to a function and use it os independently (python3)

You can make os independent a file-name with:
os.path.join(os.path.curdir, 'filename')
But I need something else. I want my function to take whatever path format (windows, linux, mac), to convert it to os independent format, in order to assign it at a variable and work with it at any os. Also, note that the filename is not necessarily at the curent directory.
example:
def magically_read_any_path_format_and_make_it_os_independent(input_file)
...
return os_independent_format
# my variable
file = magically_...(input_file)
P.S. I know I can check whether it is the one or the other os and make the corresponding conversions, but is there something more automated and pythonic? Something in os or pathlib or elsewhere?
Thank you
If you're using python > 3.4 you can use pathlib and it does exactly that. You create a Path object from your path and it handles everything for you.
On older versions of Python, as long as you use os.path for any path manipulations you need, and use the Unix format for any hard-coded constants together with os.path.normpath, it should work on any os.
Not sure exactly what you mean by os independent the docs state that "os.path module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths" meaning it gives you what you need locally.
You may be able to
os.path.join(os.path.join(os.path.curdir, '') which is rather hacky
or
os.path.normpath(os.path.curdir) which may do more then you want, see the docs

Is this the approved way to acess data adjacent to/packaged with a Python script?

I have a Python script that needs some data that's stored in a file that will always be in the same location as the script. I have a setup.py for the script, and I want to make sure it's pip installable in a wide variety of environments, and can be turned into a standalone executable if necessary.
Currently the script runs with Python 2.7 and Python 3.3 or higher (though I don't have a test environment for 3.3 so I can't be sure about that).
I came up with this method to get the data. This script isn't part of a module directory with __init__.py or anything, it's just a standalone file that will work if just run with python directly, but also has an entry point defined in the setup.py file. It's all one file. Is this the correct way?
def fetch_wordlist():
wordlist = 'wordlist.txt'
try:
import importlib.resources as res
return res.read_binary(__file__, wordlist)
except ImportError:
pass
try:
import pkg_resources as resources
req = resources.Requirement.parse('makepw')
wordlist = resources.resource_filename(req, wordlist)
except ImportError:
import os.path
wordlist = os.path.join(os.path.dirname(__file__), wordlist)
with open(wordlist, 'rb') as f:
return f.read()
This seems ridiculously complex. Also, it seems to rely on the package management system in ways I'm uncomfortable with. The script no longer works unless it's been pip-installed, and that also doesn't seem desirable.
Resources living on the filesystem
The standard way to read a file adjacent to your python script would be:
a) If you've got python>=3.4 I'd suggest you use the pathlib module, like this:
from pathlib import Path
def fetch_wordlist(filename="wordlist.txt"):
return (Path(__file__).parent / filename).read_text()
if __name__ == '__main__':
print(fetch_wordlist())
b) And if you're still using a python version <3.4 or you still want to use the good old os.path module you should do something like this:
import os
def fetch_wordlist(filename="wordlist.txt"):
with open(os.path.join(os.path.dirname(__file__), filename)) as f:
return f.read()
if __name__ == '__main__':
print(fetch_wordlist())
Also, I'd suggest you capture exceptions in the outer callers, the above methods are standard way to read files in python so you don't need wrap them in a function like fetch_wordlist, said otherwise, reading files in python is an "atomic" operation.
Now, it may happen that you've frozen your program using some freezer such as cx_freeze, pyinstaller or similars... in that case you'd need to detect that, here's a simple way to check it out:
a) using os.path:
if getattr(sys, 'frozen', False):
app_path = os.path.dirname(sys.executable)
elif __file__:
app_path = os.path.dirname(__file__)
b) using pathlib:
if getattr(sys, 'frozen', False):
app_path = Path(sys.executable).parent
elif __file__:
app_path = Path(__file__).parent
Resources living inside a zip file
The above solutions would work if the code lives on the file system but it wouldn't work if the package is living inside a zip file, when that happens you could use either importlib.resources (new in version 3.7) or pkg_resources combo as you've already shown in the question (or you could wrap up in some helpers) or you could use a nice 3rd party library called importlib_resources that should work with the old&modern python versions:
pypi: https://pypi.org/project/importlib_resources/
documentation: https://importlib-resources.readthedocs.io/en/latest/
Specifically for your particular problem I'd suggest you take a look to this https://importlib-resources.readthedocs.io/en/latest/using.html#file-system-or-zip-file.
If you want to know what that library is doing behind the curtains because you're not willing to install any 3rd party library you can find the code for py2 here and py3 here in case you wanted to get the relevant bits for your particular problem
I'm going to go out on a limb and make an assumption because it may drastically simplify your problem. The only way I can imagine that you can claim that this data is "stored in a file that will always be in the same location as the script" is because you created this data, once, and put it in a file in the source code directory. Even though this data is binary, have you considered making the data a literal byte-string in a python file, and then simply importing it as you would anything else?
You're right about the fact that your method of reading a file is a bit unnecessarily complex. Unless you have got a really specific reason to use the importlib and pkg_resources modules, it's rather simple.
import os
def fetch_wordlist():
if not os.path.exists('wordlist.txt'):
raise FileNotFoundError
with open('wordlist.txt', 'rb') as wordlist:
return wordlist.read()
You haven't given much information regarding your script, so I cannot comment on why it doesn't work unless it's installed using pip. My best guess: your script is probably packed into a python package.

How to pass entire directory into python via command line and sys.argv

So in the past when I've used a unix server to do my python development if I wanted to pass in an entire folder or directory, I would just put an asterisk() on the end of it. An example would be something like users/wilkenia/shakespeare/ to pass in a set of files containing each of shakespeare's plays. Is there a way to do this in windows? I've tried putting in C:\Users\Alexander\Desktop\coding-data-exam\wx_data* and the same with the disk name removed. Nothing has worked so far, in fact, it takes in the directory as an argument itself.
Edit: implemented glob, getting a permissions error, even though I'm running as administrator. Here's my code if anyone wants to have a look.
For the sake of showing how you can use pathlib to achieve this result. You can do something like this:
some_script.py:
from pathlib import Path
path = Path(sys.argv[1])
glob_path = path.glob('*')
for file_path in glob_path:
print(file_path)
Demo:
python some_script.py C:/some/path/
Output:
C:/some/path/depth_1.txt
C:/some/path/dude.docx
C:/some/path/dude.py
C:/some/path/dude_bock.txt
The nice thing about pathlib, is that it takes an object oriented approach to help work with the filesystem easier.
Note: pathlib is available out-of-the-box from Python 3.4 and above. If you are using an older version of Python, you will need to use the backported package that you can get from pypi: here
Simply: pip install pathlib2
You can use the glob module, it does exactly this.
A quick demo:
In [81]: import glob
In [82]: glob.glob('*')
Out[82]:
[
... # a bunch of my personal files from my cwd
]
If you want to extend this for your use case, you'll need to do something along the lines of:
import sys
import glob
arg = sys.argv[1]
for file in glob.glob(arg):
....
You'll read your args with sys.argv and pass it onto glob.

Making Python guess a file Name

I have the following function:
unpack_binaryfunction('third-party/jdk-6u29-linux-i586.bin' , ('/home/user/%s/third-party' % installdir), 'jdk1.6.0_29')
Which uses os.sys to execute a java deployment. The line, combined with the function (Which is unimportant, it just calls some linux statements) works perfectly.
However, this only works if in the 'third-party' folder is specificaly that version of the jdk.
Therefore I need a code that will look at the files in the 'third-party' folder and find one that starts with 'jdk' and fill out the rest of the filename itself.
I am absolutely stuck. Are there any functions or libraries that can help with file searching etc?
To clarify: I need the code to not include the entire: jdk-6u29-linux-i586.bin but to use the jdk-xxxx... that will be in the third-party folder.
This can easily be done using the glob module, and then a bit a string parsing to extract the version.
import glob
import os.path
for path in glob.glob('third-party/jdk-*'):
parent, name = os.path.split(path) # "third-party", "jdk-6u29-linux-i586.bin"
version, update = name.split('-')[1].split('u') # ("6", "29")
unpack_binaryfunction(path, ('/home/user/%s/third-party' % installdir), 'jdk1.{}.0_{}'.format(version, update))

Way to access resource files in python

What is the proper way to access resources in python programs.
Basically in many of my python modules I end up writing code like that:
DIRNAME = os.path.split(__file__)[0]
(...)
template_file = os.path.join(DIRNAME, "template.foo")
Which is OK but:
It will break if I will start to use python zip packages
It is boilerplate code
In Java I had a function that did exactly the same --- but worked both when code was lying in bunch of folders and when it was packaged in .jar file.
Is there such function in Python, or is there any other pattern that I might use.
You'll want to look at using either get_data in the stdlib or pkg_resources from setuptools/distribute. Which one you use probably depends on whether you're already using distribute to package your code as an egg.
Since version 3.7 of Python, the proper way to access a file in resources is to use the importlib.resources library.
One can, for example, use the path function to access a particular file in a Python package:
import importlib.resources
with importlib.resources.path("your.package.templates", "template.foo") as template_file:
...
Starting with Python 3.9, this package introduced the files() API, to be preferred over the legacy API.
One can, use the files function to access a particular file in a Python package:
template_res = importlib.resources.files("your.package.templates").joinpath("template.foo")
with importlib.resources.as_file(template_res) as template_file:
...
For older versions, I recommend to install and use the importlib-resources library. The documentation also explains in detail how to migrate your old implementation using pkg_resources to importlib-resources.
Trying to understand how we could combine the two aspect togather
Loading for resources in native filesystem
Packaged in zipped files
Reading through the quick tutorial on zipimport : http://www.doughellmann.com/PyMOTW/zipimport/
I see the following example:
import sys
sys.path.insert(0, 'zipimport_example.zip')
import os
import zipimport
importer = zipimport.zipimporter('zipimport_example.zip')
module = importer.load_module('example_package')
print module.__file__
print module.__loader__.get_data('example_package/README.txt')
I think that output of __file__ is "zipimport_example.zip/example_package/__init__.pyc"
Need to check how it looks from inside.
But then we could always do something like this:
if ".zip" in example_package.__file__:
...
load using get_data
else:
load by building the correct file path
[Edit:] I have tried to work out the example a bit better.
If the the package gets imported as zipped file then, two things happen
__file__ contains ".zip" in it's path.
__loader__ is available in the name space
If these two conditions are met then within the package you could do:
print __loader__.get_data(os.path.join('package_name','README.txt'))
else the module was loaded normally and you can follow the regular approach to loading the file.
I guess the zipimport standard python module could be an answer...
EDIT: well, not the use of the module directly, but using sys.path as shown in the example could be a good way:
I have a zip file test.zip with one python module test and a file test.foo inside
to test that for the zipped python module test can be aware of of test.foo, it contains this code:
c
import os
DIRNAME = os.path.dirname(__file__)
if os.path.exists(os.path.join(DIRNAME, 'test.foo')):
print 'OK'
else:
print 'KO'
Test looks ok:
>>> import sys
>>> sys.path.insert(0, r'D:\DATA\FP12210\My Documents\Outils\SVN\05_impl\2_tools\test.zip')
>>> import test
OK
>>>
So a solution could be to loop in your zip file to retrieve all python modules, and add them in sys.path; this piece of code would be ideally the 1st one loaded by your application.

Categories

Resources