HDF5: alias for data-keys in caffe - python

I'm trying to load HDF5 data for my caffe-net as input data. So far I can create those hdf5-databases and the list file. I also read the simple example in https://ceciliavision.wordpress.com/2016/03/21/caffe-hdf5-layer/. There is stated that the dataset keys are also the names of the data-layer in caffe. But what I want is a to give them kind of aliases in caffe, is that possible? The reason is that I have two hdf5-DB with the same dataset-structure inside, so also the same dataset-keys. Is there a name-clash if I load both hdf5-DB in the same net and if so can I change the names without changing the hdf5-DB itself?
Thanks for help!

Related

MXNet parameter serialisation with numpy

I want to use a pre-trained MXNet model on s390x architecture but it doesn't seem to work. This is because the pre-trained models are in little-endian whereas s390x is big-endian. So, I'm trying to use https://numpy.org/devdocs/reference/generated/numpy.lib.format.html which works on both little-endian as well as big-endian.
One way to solve this is to I've found is to load the model parameters on an x86 machine, call asnumpy, save through numpy Then load the parameters on s390x machine using numpy and convert them to MXNet. But I'm not really sure how to code it. Can anyone please help me with that?
UPDATE
It seems the question is unclear. So, I'm adding an example that better explains what I want to do in 3 steps -
Load a preexisting model from MXNet, something like this -
net = mx.gluon.model_zoo.vision.resnet18_v1(pretrained=True, ctx=mx.cpu())
Export the model. The following code saves the model parameters in .param file. But this .param binary file has endian issues. So, instead of directly saving the model using mxnet API, I want to save the parameters file using numpy - https://numpy.org/devdocs/reference/generated/numpy.lib.format.html. Because using numpy, would make the binary file (.npy) endian independent. I am not sure how can I convert the parameters of MXNet model into numpy format and save them.
gluon.contrib.utils.export(net, path="./my_model")
Load the model. The following code loads the model from .param file.
net = gluon.contrib.utils.import(symbol_file="my_model-symbol.json",
param_file="my_model-0000.params",
ctx = 'cpu')
Instead of loading using the MXNet API, I want to use numpy to load .npy file that we created in step 2. After we have loaded the .npy file, we need to convert it to MXNet. So, I can finally use the model in MXNet.
Starting from the code snippets posted in the other question, Save/Load MXNet model parameters using NumPy :
It appears that mxnet has an option to store data internally as numpy arrays:
mx.npx.set_np(True, True)
Unfortunately, this option doesn't do what it I hoped (my IPython session crashed).
The parameters are a dict of mxnet.gluon.parameter.Parameter instances, each of them containing attributes of other special datatypes. Disentangling this so that you can store it as a large number of pure numpy arrays (or a collection of them in an .npz file) is a hopeless task.
Fortunately, python has pickle to convert complex data structures into something more or less portable:
# (mxnet/resnet setup skipped)
parameters = resnet.collect_params()
import pickle
with open('foo.pkl', 'wb') as f:
pickle.dump(parameters, f)
To restore the parameters:
with open('foo.pkl', 'rb') as f:
parameters_loaded = pickle.load(f)
Essentially, it looks like resnet.save_parameters() as defined in mxnet/gluon/block.py gets the parameters (using _collect_parameters_with_prefix()) and writes them to a file using a custom write function which appears to be compiled from C (I didn't check the details).
You can save the parameters using pickle instead.
For loading, load_parameters (also in util.py) contains this code (with sanity checks removed):
for name in loaded:
params[name]._load_init(loaded[name], ctx, cast_dtype=cast_dtype, dtype_source=dtype_source)
Here, loaded is a dict as loaded from the file. From examining the code, I don't fully grasp exactly what is being loaded - params seems to be a local variable in the function that is not used anymore. But it's worth a try to start from here, by writing a replacement for the load_parameters function. You can "monkey-patch" a function into an existing class by defining a function outside the class like this:
def my_load_parameters(self, ...):
... (put your modified implementation here)
mx.gluon.Block.load_parameters = my_load_parameters
Disclaimers/warnings:
even if you get save/load via pickle to work on a single big-endian system, it's not guaranteed to work between different-endian systems. The pickle protocol itself is endian-neutral, but if floating-point values (deep inside the mxnet.gluon.parameter.Parameter were stored as a raw data buffer in machine-endian convention, then pickle is not going to magically guess that groups of 8 bytes in the buffer need to be reversed. I think numpy arrays are endian-safe when pickled.
Pickle is not very robust if the underlying class definitions change between pickling and unpickling.
Never unpickle untrusted data.

use python's sklearn module with custom dataset

I've never used python before and I find myself in the dire need of using sklearn module in my node.js project for machine learning purposes.
I have been all day trying to understand the code examples in said module and now that I kind of understand how they work, I don't know how to use my own data set.
Each of the built in data sets has its own function (load_iris, load_wine, load_breast_cancer, etc) and they all load data from a .csv and an .rst file. I can't find a function that will allow me to load my own data set. (there's a load_data function but it seems to be for internal use of the previous three I mentioned, cause I can't import it)
How could I do that? What's the proper way to use sklearn with any other data set? Does it always have to be a .csv file? Could it be programmatically provided data (array, object, etc)?
In case it's important: all those built-in data sets have numeric features, my data set has both numeric and string features to be used in the decision tree.
Thanks
You can load whatever you want and then use sklearn models.
If you have a .csv file, pandas would be the best option.
import pandas as pd
mydataset = pd.read_csv("dataset.csv")
X = mydataset.values[:,0:10] # let's assume that the first 10 columns are the features/variables
y = mydataset.values[:,11] # let's assume that the 11th column has the target values/classes
...
sklearn_model.fit(X,y)
Similarily, you can load .txt or .xls files.
The important thing in order to use sklearn models is this:
X should be always be an 2D array with shape [n_samples, n_variables]
y should be the target varible.

How to load our own data set for training

I want to train a model that will tell us the PM2.5(this value describe AQI) of any image. For this I use CNN. I am using tensorflow for this purpose. I am new in this field.Please tell me how we upload our own dataset and separate its name and tags. The format of image name is "imageName_tag"(e.g ima01_23.4)
I think we need more information about your case regarding the "how upload our own dataset".
However, if your dataset is on your computer and you want to access it from python, i invite you to take a look at the libraries "glob" and "os".
To split the name (which in your case is "imageName_tag") you can use:
string = "imageName_tag"
name, tag = string.split('_')
As you'll have to do it for all your data, you'll have to use it in a loop and store the extracted informations in lists.

Export PSD Layers to EXR in Python

I'm trying to write a program to read in a .psd file, split the layers into individual images (maintaining the original image's dimensions) and export them as EXR files.
I'm currently trying to use the OpenImageIo library to accomplish this but the documentation isn't particularly clear on how this can be achieved in python.
I've successfully managed to read the full .psd and export it to .exr, but nothing I've been trying seems to indicate that there is more than one layer (subimage) to interact with.
Is there:
something obvious that I'm missing, or
a better way to accomplish this?
Side note:
I have had some success using psd_tools2 but the images can't be exported as .exr nor are they the correct dimensions.
This is actually relatively straightforward, however there is one caveat in that it only seems to be supported for 8-bit psd files at the moment.
import OpenImageIO as oiio
sourcefile = '/path/to/sourcefile.psd'
buf = oiio.ImageBuf(sourcefile)
for layer in range(buf.nsubimages):
buf.reset(sourcefile, subimage=layer)
buf.write('/tmp/mylayer_{l}.exr'.format(l=layer))

how to view sparse matrices outside python environment

I am working with sparse matrices which are 11685 by 85730 . I am able to store it only as a .pickle file . I want to view the file outside the python environment also . I tried saving as a .txt and .csv files but they are of no help . Can anybody suggest a suitable format and library so that I can view those matrices outside the python environment .
Python allows you to write to many formats that are readable outside of python. .csv is one format, but there are also HDF5 and netcdf4 among others (those are meant to store array data though).
http://code.google.com/p/netcdf4-python/
http://code.google.com/p/h5py/
Or you could save them in a matlab readable format:
http://docs.scipy.org/doc/scipy/reference/tutorial/io.html
What you use should depend on how you plan on accessing the data outside of python.

Categories

Resources