How to use the PCACompute function from Python in OpenCV 3? - python

The cv2.PCACompute function worked well in OpenCV 2.4 using the following syntax :
import cv2
mean, eigvec = cv2.PCACompute(data)
The function exists in OpenCV 3.1, but raises the following exception :
TypeError: Required argument 'mean' (pos 2) not found
The C++ documentation is not very helpful at explaining how I should call it from Python. I'm guessing that InputOutputArray arguments are now also mandatory arguments in the Python function signature, but I am unable to find a way to make them work.
Is there a way I can call it properly?
(Note: I know there are other ways I can run a PCA, and I'll probably end up with one of them. I'm just curious about how the new OpenCV bindings works.)

Simple answer:
mean, eigvec = cv2.PCACompute(data, mean=None)
With details:
Let search PCACompute the source first.Then find this:
// [modules/core/src/pca.cpp](L351-L360)
void cv::PCACompute(InputArray data, InputOutputArray mean,
OutputArray eigenvectors, int maxComponents)
{
CV_INSTRUMENT_REGION()
PCA pca;
pca(data, mean, 0, maxComponents);
pca.mean.copyTo(mean);
pca.eigenvectors.copyTo(eigenvectors);
}
OK, now we read the document:
C++: PCA& PCA::operator()(InputArray data, InputArray mean, int flags, int maxComponents=0)
Python: cv2.PCACompute(data[, mean[, eigenvectors[, maxComponents]]]) → mean, eigenvectors
Parameters:
data – input samples stored as the matrix rows or as the matrix columns.
mean – optional mean value; if the matrix is empty (noArray()), the mean is computed from the data.
flags –
operation flags; currently the parameter is only used to specify the data layout.
CV_PCA_DATA_AS_ROW indicates that the input samples are stored as matrix rows.
CV_PCA_DATA_AS_COL indicates that the input samples are stored as matrix columns.
maxComponents – maximum number of components that PCA should retain; by default, all the components are retained.
This to say,
## py
mean, eigvec = cv2.PCACompute(data, mean=None)
is equals to
// cpp
PCA pca;
pca(data, mean=noArray(), flags=CV_PCA_DATA_AS_ROW);
...

Related

Is there a way to extract pixel color information from an image with OpenImageIO's python bindings

I want to ask the more experienced people how to get the RGB values of a pixel in an image using oiio's python bindings.
I have only just begun using oiio and am unfamiliar with the library as well as image manipulation on code level.
I poked around the documentation and I don't quite understand how the parameters work. It doesn't seem to work the same as python and I'm having a hard time trying to figure out, as I don't know C.
A) What command to even use to get pixel information (seems like get_pixel could work)
and
B) How to get it to work. I'm not understanding the parameters requirements exactly.
Edit:
I'm trying to convert the C example of visiting all pixels to get an average color in the documentation into something pythonic, but am getting nowhere.
Would appreciate any help, thank you.
Edit: adding the code
buffer = oiio.ImageBuf('image.tif')
array = buffer.read(0, 0, True)
print buffer.get_pixels(array)
the error message I get is:
# Error: Python argument types in
# ImageBuf.get_pixels(ImageBuf, bool)
# did not match C++ signature:
# get_pixels(class OpenImageIO::v1_5::ImageBuf, enum OpenImageIO::v1_5::TypeDesc::BASETYPE)
# get_pixels(class OpenImageIO::v1_5::ImageBuf, enum OpenImageIO::v1_5::TypeDesc::BASETYPE, struct OpenImageIO::v1_5::ROI)
# get_pixels(class OpenImageIO::v1_5::ImageBuf, struct OpenImageIO::v1_5::TypeDesc)
# get_pixels(class OpenImageIO::v1_5::ImageBuf, struct OpenImageIO::v1_5::TypeDesc, struct OpenImageIO::v1_5::ROI)
OpenImageIO has several classes for dealing with images, with different levels of abstraction. Assuming that you are interested in the ImageBuf class, I think the simplest way to access individual pixels from Python (with OpenImageIO 2.x) would look like this:
import OpenImageIO as oiio
buf = ImageBuf ("foo.jpg")
p = buf.getpixel (50, 50) # x, y
print (p)
p will be a numpy array, so the this will produce output like
(0.01148223876953125, 0.0030574798583984375, 0.0180511474609375)

Can someone explain how arrays and scalars are handled in a Python code snippet

I have this code snippet I am trying to understand that is in python. I don't understand how scalars operate on arrays in all cases. In most code I read it makes sense that operations work on each value of an array.
sig_sq_samples = beta*invgamma.rvs(alpha,size=n_samples)
var_norm = sqrt(sig_sq_samples/kN)
mu_samples = norm.rvs(mean_norm,scale=var_norm,size=n_samples)
I want to know how each line is functioning. The reason being is that I don't have a linux machine setup with the library and thought someone may be able to help me understand this python code I have found in an article. I can not setup the environment in a reasonable amount of time.
invgamma.rvs() - returns an array of numeric values
beta - is a scalar value
sig_sq_samples (I'm assuming)- is an array of beta * each array value of
what invgamma.rvs() function returns.
var_norm - I have no idea what this value is supposed to be because
the norm.rvs function underneath takes a scalar (scale=var_norm).
In short how is sqrt(siq_sq_samples/kn) with kN also a scalar returning back a scalar? What is happening here? This one line is what is getting me. Like I said earlier sig_sq_samples is an array. I hope I'm not wrong about that line that is producing sig_sq_samples. At one point or another the values being worked on are scalars. I am from c# where hard types are used and I have worked with scripting languages such as PERL where I had a lot of experience with what "shortcut" operations do. Ex. C# does not allow you to multiply a scalar to an array. I tried to look up how scalars work with arrays but it doesn't clarify this code to me. Anyone answering is more than welcome to look up the functions above in case I am wrong about anything. I put a lot of effort and I have many years of development experience. Either this code snippet is wrong or I'm just not seeing something real obvious.
In the line
mu_samples = norm.rvs(mean_norm,scale=var_norm,size=n_samples)
n_samples has the same size as var_norm, so what is happening is that for the ith sample of n_samples, it generates it using the ith scale parameter of var_norm, var_norm[i]
Internal to the code is
vals = vals * scale + loc, when scale is an array it uses broadcasting which is a common feature of numpy. norm.rvs already generated an array of n_samples random values. When multiplied by scale, it does an element-wise multiplication between each array. The result is that the left hand side will also be an array value. For more information see here
sig_sq_samples = beta*invgamma.rvs(alpha,size=n_samples)
if
invgamma.rvs() - returns an array of numeric values
beta - is a scalar value
then
sig_sq_samples = beta*invgamma.rvs(alpha,size=n_samples)
produces another array of the same size. Scalar beta just multiplies each element.
In
var_norm = sqrt(sig_sq_samples/kN)
kN is scalar doing the same thing - dividing each element. I assume sqrt is numpy.sqrt, which takes the sqrt of each element. So var_norm is again an array of the original size (of invgammas.rvs()).
mu_samples = norm.rvs(mean_norm,scale=var_norm,size=n_samples)
I don't know what norm.rvs does, or where it is from. It's not numpy, but could be a package in scipy. I'd have to google it. It takes one postional argument, here mean_norm, and two (at least) keyword values. n_samples is probably a number, eg. 100. But scale could certainly take an array, such as var_norm.
======================
http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.rv_continuous.rvs.html#scipy.stats.rv_continuous.rvs
appears to be the documentation for the rvs method (norm is a subclass of rv_continuous).
Arguments are:
arg1, arg2, arg3,... : array_like
The shape parameter(s) for the distribution (see docstring of the instance object for more information).
scale : array_like, optional
Scale parameter (default=1).
size : int or tuple of ints, optional
Defining number of random variates (default is 1).
and the result is
rvs : ndarray or scalar
Random variates of given size.
I'm guessing invgamma.rvs is the similar method for a different subclass. alpha must be the shape argument for the first, and norm_mean the shape for the 2nd.

Most suitable clustering method for a dataset containing 10 dimension numerical arrays

I have a data set (~4k samples) of the following structure:
sample type: string - very general
sample sub type: string
sample model number: number - may be None
signature: number array[10]
sampleID: string - unique id
I want to cluster the samples based on the "signature" (I have a function that measures "distance" between one signature to another). So that when I'll encounter a new signature I'll be able to tell to which type/sub type the sample belongs to.
Which algorithm should I use?
P.S. (I am using python and scikit-learn), I also need to somehow visualize the results.
Since you already have a distance function, and yoour data set is tiny, just use HAC, the grandfather of all clustering algorithms.

How to read NetCDF variable float data into a Numpy array with the same precision and scale as the original NetCDF float values?

I have a NetCDF file which contains a variable with float values with precision/scale == 7/2, i.e. there are possible values from -99999.99 to 99999.99.
When I take a slice of the values from the NetCDF variable and look at them in in my debugger I see that the values I now have in my array have more precision/scale than what I see in the original NetCDF. For example when I look at the values in the ToosUI/ncdump viewer they display as '-99999.99' or '12.45' but when I look at the values in the slice array they look like '-99999.9921875' (a greater scale length). So if I'm using '-99999.99' as the expected value to indicate a missing data point then I won't get a match with what gets pulled into the slice array since those values have a greater scale length and the additional digits in the scale are not just zeros for padding.
For example I see this if I do a ncdump on a point within the NetCDF dataset:
Variable: precipitation(0:0:1, 40:40:1, 150:150:1)
float precipitation(time=1348, lat=180, lon=360);
:units = "mm/month";
:long_name = "precipitation totals";
data:
{
{
{-99999.99}
}
}
However if I get a slice of the data from the variable like so:
value = precipitationVariable[0:1:1, 40:41:1, 150:151:1]
then I see it like this in my debugger (Eclipse/PyDev):
value == ndarray: [[[-99999.9921875]]]
So it seems as if the NetCDF dataset values that I read into a Numpy array are not being read with the same precision/scale of the original values in the NetCDF file. Or perhaps the values within the NetCDF are actually the same as what I'm seeing when I read them, but what's shown to me via ncdump is being truncated due to some format settings in the ncdump program itself.
Can anyone advise as to what's happening here? Thanks in advance for your help.
BTW I'm developing this code using Python 2.7.3 on a Windows XP machine and using the Python module for the NetCDF4 API provided here: https://code.google.com/p/netcdf4-python/
There is no simple way of doing what you want because numpy stores the values as single precision, so they will always have the trailing numbers after 0.99.
However, netCDF already provides a mechanism for missing data (see the best practices guide). How was the netCDF file written in the first place? The missing_value is a special variable attribute that should be used to indicate those values that are missing. In the C and Fortran interfaces, when the file is created all variable values are set to be missing. If you wrote a variable all in one go, you can then set the missing_value attribute to an array of indices where the values are missing. See more about the fill values in the C and Fortran interfaces. This is the recommended approach. The python netCDF4 module plays well with these missing values, and such arrays are read as masked arrays in numpy.
If you must work with the file you currently have, then I'd suggest creating a mask to cover values around your missing value:
import numpy as np
value = precipitationVariable[:]
mask = (value < -99999.98) & (value > -100000.00)
value = np.ma.MaskedArray(value, mask=mask)

python scipy weave long integer

I'm using scipy.weave to improve the performance of my python code. Basically, I have to go through a long array (1024^3,3) -i.e. an array containing 1024^3 elements, with each element having 3 entries- compute several things for each element and then fill another array.
The problem is that I get and segmentation fault when the array is larger than ~(850**3,3). The segmentation fault takes places when I try to read the value of the array at the position (a,3), where a = 715827882. Note that 3*a ~ 2^31. I have carefully explored this issue and it seems to me that I can't go through arrays with a length larger than the size of an integer variable.
In fact, this simple program
################################
import numpy as np
import scipy.weave as wv
def printf():
a=3*1024**3
support = """
#include <iostream>
using namespace std;
"""
code = """
cout << a << endl;
"""
wv.inline(code,['a'],
type_converters = wv.converters.blitz,
support_code = support,libraries = ['m'])
printf()
#########################################
outputs -1073741824 instead of 3221225472. Which means (I think) that the variable a is taken in the c code as an integer of 32 bits, instead of 64 bits.
Does anyone know how to solve this? Of course, I can only split my array in pieces of sizes smaller than 2^31, but I found this very inefficient.
Thanks.

Categories

Resources