reduce() hstack python

reduce() hstack python - python

I am trying to use reduce() function to create a function hstack() which horizontally stacks multiple arrays. As a simple example, lets say
>>>>M=eye((4))
>>>>M
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
>>>>hstack([M,M])
array([[ 1., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 1.]])
This works as I want. Now I define
>>>> hstackm = lambda *args: reduce(hstack, args)
And try to do the hstack() from the previous case
>>>>hstackm([M,M])
[array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]]),
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])]
Which is incorrect. How do I define hstackm() to obtain a proper output?
My final objective will be to create a hstackm() function to stack SPARSE matrices if it is possible. Something like,
hstackm = lambda *args: reduce(sparse.hstack, args).
The _*args_ would be csr or _lil_matrix_
thank you

In [16]: hstackm = lambda args: reduce(lambda x,y:hstack((x,y)), args)
In [17]: hstackm([M,M])
Out[17]:
array([[ 1., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 1.]])

Your function hstack takes one parameter, a list of matrices. reduce() calls it with two parameters instead, each a matrix.
Change your hstack method to accept an arbitrary number of arguments instead:
def hstack(*matrices):
....
instead of hstack(matrices), then call it as hstack(M, M).

Related

Is there an efficient way of representing a 2D numpy array for the purpose of fitting a GMM to it?

I have been using Gaussian Mixture Models (GMM) to model a set of peaks in a 2D numpy array (a).
a = np.array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 100., 1000., 100., 2., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 100., 100., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 2., 1., 2., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
The problem is that in order to fit a GMM to my data with sklearn I have to first generate a density_array, which holds a huge amount of data points depending on the height of the peaks in a.
def convert_to_density_array(array):
"""
Convert an array to a density array
"""
density_list = []
# iterate over each i,j coordinate in the array
for (i, j), value in np.ndenumerate(array):
for x in range(int(value)):
density_list.append((i, j))
return np.array(density_list)
density_array = convert_to_density_array(a)
gmm = mixture.GaussianMixture(n_components=2,covariance_type='full').fit(density_array)
Is there an efficient way of representing a 2D numpy array for the purpose of fitting a GMM to it?

you can store data using less precision by adding dtype=np.float32 to your np.array call, which is okay as long as you are fine with 8 digits of precision instead of 15 (which is totally acceptable in your case), but that's the only way to store the same data in memory in less footprint and still pass it to gmm.
what you are trying to do is curve fitting, not data modelling , so you can use scipy curve fit on your original data without making density_array to start with, you just have to pass it a function of two gaussians and in a loop change the initial estimate randomly until you get the least error, but as writing the code for it will take some time, consider this approach only if you cannot get your data in memory using any other method.

Python sklearn OneHotEncoding categorical and sometimes repeated values

This is my problem with sklearn's OneHotEncoder.
with an array a = [1,2,3,4,5,6,7,8,9,22] i.e ALL UNIQUE of a.shape=[10,1] (after reshape(-1,1), a [10,10] matrix of OneHotEncoded values is returned.
array([[ 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.]])
But with an array like a = [1,2,2,4,4,6,7,8,9,22] i.e NON UNIQUE of a.shape=[10,1] (after reshape(-1,1), a [10,8] matrix of OneHotEncoded values is returned.
array([[ 1., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 1.]])
But I cannot use this as my input placeholder expects a [10,10] matrix as input. Can anyone help me handle non-unique values in sklearn's OneHotEncoder?
P.S Adding the parameter n_values= 10 gives an error saying ValueError: Feature out of bounds for n_values=10

Do you know all the values your categorical feature can take? If so, you can do something like this:
enc = OneHotEncoder()
enc.fit(np.asarray([1,2,3,4,5,6,7,8,9,22]).reshape(-1, 1)) #fit your encoder to the values
data_for_encoding = np.asarray([1,2,2,4,4,6,7,8,9,22]).reshape(-1, 1) #your data
sparse_matrix = enc.transform(data_for_encoding) #encoded data

Is there a numpy way to reduce arrays?

I have this numpy array which is a concatention of other numpy arrays
array([array([[ 0., 1., 0., 0., 1., 0.]]),
array([[ 1., 0., 0., 1., 0., 0.]]),
array([[ 0., 0., 0., 0., 1., 1.]]),
array([[ 0., 1., 0., 0., 0., 1.]]),
array([[ 0., 1., 0., 1., 0., 0.]]),
array([[ 1., 0., 0., 0., 0., 1.]])], dtype=object)
its current shape is (6,). what I want is this with a shape (6,6)
array([[ 0., 1., 0., 0., 1., 0.],
[ 1., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 1.],
[ 0., 1., 0., 0., 0., 1.],
[ 0., 1., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 0., 1.]], dtype=object)
Is there a numpy way to solve this problem or do I have to loop through the arrays and append it?

If the display is accurate, and the array really is (6,), then we have to recreate it with:
In [27]: array=np.array
In [28]: alist = [array([[ 0., 1., 0., 0., 1., 0.]]),
...: array([[ 1., 0., 0., 1., 0., 0.]]),
...: array([[ 0., 0., 0., 0., 1., 1.]]),
...: array([[ 0., 1., 0., 0., 0., 1.]]),
...: array([[ 0., 1., 0., 1., 0., 0.]]),
...: array([[ 1., 0., 0., 0., 0., 1.]])]
...:
In [29]: A = np.empty((6,),object)
In [30]: A
Out[30]: array([None, None, None, None, None, None], dtype=object)
In [31]: A[:]=alist
In [32]: A
Out[32]:
array([array([[ 0., 1., 0., 0., 1., 0.]]),
array([[ 1., 0., 0., 1., 0., 0.]]),
array([[ 0., 0., 0., 0., 1., 1.]]),
array([[ 0., 1., 0., 0., 0., 1.]]),
array([[ 0., 1., 0., 1., 0., 0.]]),
array([[ 1., 0., 0., 0., 0., 1.]])], dtype=object)
reshape does not work:
In [33]: A.reshape(6,6)
...
ValueError: cannot reshape array of size 6 into shape (6,6)
But the array can be treated as a list, and given to concatenate:
In [34]: np.concatenate(A, axis=1)
Out[34]:
array([[ 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0.,
0., 0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 0., 1.,
0., 1., 0., 0., 1., 0., 0., 0., 0., 1.]])
In [35]: np.concatenate(A, axis=0)
Out[35]:
array([[ 0., 1., 0., 0., 1., 0.],
[ 1., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 1.],
[ 0., 1., 0., 0., 0., 1.],
[ 0., 1., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 0., 1.]])
Concatenate on the list works just as well: np.concatenate(alist, axis=0)
I should note that the resulting array is dtype float, not object. It could be converted with astype, but who would want that?
Simple copy-n-paste produces a 3d array, since the outer array ignores the inner division and creates as high-a-dimensional array as it can:
In [37]: array([array([[ 0., 1., 0., 0., 1., 0.]]),
...: array([[ 1., 0., 0., 1., 0., 0.]]),
...: array([[ 0., 0., 0., 0., 1., 1.]]),
...: array([[ 0., 1., 0., 0., 0., 1.]]),
...: array([[ 0., 1., 0., 1., 0., 0.]]),
...: array([[ 1., 0., 0., 0., 0., 1.]])])
Out[37]:
array([[[ 0., 1., 0., 0., 1., 0.]],
[[ 1., 0., 0., 1., 0., 0.]],
...
[[ 1., 0., 0., 0., 0., 1.]]])
In [38]: _.shape
Out[38]: (6, 1, 6)
So we need to careful how we recreate cases like this.

You should try this:
my_array = my_array.reshape(6,6)
It works with the above array when pasted as is as it will remove the third dimension. Other methods like vstack and concatenate as shown on #Divikar comment above should work as well for this purpose

Appending matrix A with matrix B

Say I have two matrices A and B. For example,
A = numpy.zeros((5,5))
B = np.eye(5)
Is there a way to append A and B?

It sounds to me like you're looking for np.hstack:
>>> import numpy as np
>>> a = np.zeros((5, 5))
>>> b = np.eye(5)
>>> np.hstack((a, b))
array([[ 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
np.vstack will work if you want to stack them downward:
>>> np.vstack((a, b))
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 1.]])

is TensorSharedVariable in theano initilized twice in function?

In theano, once the sharedvarialbe is initialized in one function, it will never be initialized again even if the function is accessed repeatedly, am I right?
def sgd_updates_adadelta(params,cost,rho=0.95,epsilon=1e-6,norm_lim=9,word_vec_name='Words'):
updates = OrderedDict({})
exp_sqr_grads = OrderedDict({})
exp_sqr_ups = OrderedDict({})
gparams = []
for param in params:
empty = np.zeros_like(param.get_value())
exp_sqr_grads[param] = theano.shared(value=as_floatX(empty),name="exp_grad_%s" % param.name)
gp = T.grad(cost, param)
exp_sqr_ups[param] = theano.shared(value=as_floatX(empty), name="exp_grad_%s" % param.name)
gparams.append(gp)
In the code above, the exp_sqr_grads variable and the exp_sqr_ups variable will not be initialized with zeros again when the sgd_updates_adadelta function is called the second time?

Shared variables are not static, if that is what you mean. My understanding of your code:
import theano
import theano.tensor as T
global_list = []
def f():
a = np.zeros((4, 5), dtype=theano.config.floatX)
b = theano.shared(a)
global_list.append(b)
Copy and paste this into an IPython and then try:
f()
f()
print global_list
The list will contain two items. They are not the same object:
In [9]: global_list[0] is global_list[1]
Out[9]: False
And they don't refer to the same memory: Do
global_list[0].set_value(np.arange(20).reshape(4, 5).astype(theano.config.floatX))
Then
In [20]: global_list[0].get_value()
Out[20]:
array([[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14.],
[ 15., 16., 17., 18., 19.]])
In [21]: global_list[1].get_value()
Out[21]:
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
Having established that initializing shared variables several times leads to different variables, here is how to update a shared variable using a function. We re-use the established shared variables:
s = global_list[1]
x = T.scalar(dtype=theano.config.floatX)
g = theano.function([x], [s], updates=[(s, T.inc_subtensor(s[0, 0], x))])
g now increments the top left value of s by x at every call:
In [7]: s.get_value()
Out[7]:
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
In [8]: g(1)
Out[8]:
[array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])]
In [9]: s.get_value()
Out[9]:
array([[ 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
In [10]: g(10)
Out[10]:
[array([[ 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])]
In [11]: s.get_value()
Out[11]:
array([[ 11., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

reduce() hstack python - python

In [16]: hstackm = lambda args: reduce(lambda x,y:hstack((x,y)), args) In [17]: hstackm([M,M]) Out[17]: array([[ 1., 0., 0., 0., 1., 0., 0., 0.], [ 0., 1., 0., 0., 0., 1., 0., 0.], [ 0., 0., 1., 0., 0., 0., 1., 0.], [ 0., 0., 0., 1., 0., 0., 0., 1.]])

Your function hstack takes one parameter, a list of matrices. reduce() calls it with two parameters instead, each a matrix. Change your hstack method to accept an arbitrary number of arguments instead: def hstack(*matrices): .... instead of hstack(matrices), then call it as hstack(M, M).

Related

Is there an efficient way of representing a 2D numpy array for the purpose of fitting a GMM to it?

Python sklearn OneHotEncoding categorical and sometimes repeated values

Is there a numpy way to reduce arrays?

Appending matrix A with matrix B

is TensorSharedVariable in theano initilized twice in function?

Categories

Resources