Sparse DataArray Xarray search - python

Using DataArray objects in xarray what is the best way to find all cells that have values != 0.
For example in pandas I would do
df.loc[df.col1 > 0]
My specific example I'm trying to look at 3 dimensional brain imaging data.
first_image_xarray.shape
(140, 140, 96)
dims = ['x','y','z']
Looking at the documentation for xarray.DataArray.where it seems I want something like this:
first_image_xarray.where(first_image_xarray.y + first_image_xarray.x > 0,drop = True)[:,0,0]
But I still get arrays with zeros.
<xarray.DataArray (x: 140)>
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -0., 0., -0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Dimensions without coordinates: x
Also - a side question - why are there some negative zeros? Are these values rounded and -0. is actually equal to something like -0.009876 or something?

(Answer to main question)
You are almost there. However, a slight syntax difference makes a big difference here. On one hand, here is the solution to filter >0 values using a "value-based" mask.
# if you want to DROP values which do not suffice a mask condition
first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, drop=True)
or
# if you want to KEEP values which do not suffice a mask condition as nan
first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, np.nan)
On the other hand, the reason why your attempt did not work as you hoped is because with first_image_xarray.x, it is referring to the index of elements in the array (in x direction) rather than referring to the value of the elements. Thus only the 1st element of your output should be nan instead of 0 because it only does not suffice the mask condition in slice [:,0,0]. Yes, you were creating an "index-based" mask.
The following small experiment (hopefully) articulates this critical difference.
Suppose we have DataArray which consists of only 0 and 1 (dimension is aligned with the original post (OP) of the question (140,140,96)). Firstly let's mask it based on index as OP did:
import numpy as np
import xarray as xr
np.random.seed(0)
# create a DataArray which randomly contains 0 or 1 values
a = xr.DataArray(np.random.randint(0, 2, 140*140*96).reshape((140, 140, 96)), dims=('x', 'y', 'z'))
# with this "index-based" mask, only elements where index of both x and y are 0 are replaced by nan
a.where(a.x + a.y > 0, drop=True)[:,0,0]
Out:
<xarray.DataArray (x: 140)>
array([ nan, 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0.,
0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0.,
1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1.,
1., 0., 0., 0., 1., 1., 1., 0., 0., 1., 0., 0.,
1., 0., 1., 1., 0., 0., 1., 0., 0., 1., 1., 1.,
0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 0., 0.,
0., 0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0.,
0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0.,
0., 0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 1.,
0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.,
0., 1., 0., 0., 1., 0., 0., 1.])
Dimensions without coordinates: x
With the mask above, only the element where index of both x and y are 0 turns in to nan and the rest has not been changed or dropped at all.
In contrast, the proposed solution masks the DataArray based on the values of DataArray elements.
# with this "value-based" mask, all the values which do not suffice the mask condition are dropped
a[:,0,0].where(a[:,0,0] > 0, drop=True)
Out:
<xarray.DataArray (x: 65)>
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1.])
Dimensions without coordinates: x
This successfully dropped all the values which do not suffice a mask condition based on the values of DataArray elements.
(Answer to side question)
As for the origin of -0 and 0 in DataArray, rounded values from negative or positive side towards 0 would be the possibility: A related discussion was done here How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy? The below is a tiny example of this case.
import numpy as np
import xarray as xr
xr_array = xr.DataArray([-0.1, 0.1])
# you can use either xr.DataArray.round() or np.round() for rounding values of DataArray
xr.DataArray.round(xr_array)
Out:
<xarray.DataArray (dim_0: 2)>
array([-0., 0.])
Dimensions without coordinates: dim_0
np.round(xr_array)
Out:
<xarray.DataArray (dim_0: 2)>
array([-0., 0.])
Dimensions without coordinates: dim_0
As a side note, the other possibility for getting -0 in NumPy array can be numpy.set_printoptions(precision=0), which hides below decimal point like below (but I know this is not the case this time since you are using DataArray):
import numpy as np
# default value is precision=8 in ver1.15
np.set_printoptions(precision=0)
np.array([-0.1, 0.1])
Out:
array([-0., 0.])
Anyway, My best guess is that the conversion to -0 should be manual and intentional rather than automatic in data preparation & pre-processing phase.
Hope this helps.

Related

Adding zeros in between elements in numpy array with (a,b,c) shape

I have a numpy array like this
array([[[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]],
[[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]]])
shape = (2, 3, 5)
And I want an output which looks like this
output = array([[[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.]],
[[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.]]])
Note: The number of zeros to be inserted can vary depending on given factor, in this case the factor was k=3, and the insert is (k-1) which means two zeros will be inserted between numbers. Also given this output I would like to get to the initial input
You can use numpy.zeros to initialize an output array of the desired shape, then indexing to fill the values:
k = 3
shape = a.shape
output = np.zeros(shape[:-1]+((shape[-1]-1)*k+1,))
output[...,::k] = a
output:
array([[[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.]],
[[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.]]])

Is there an efficient way of representing a 2D numpy array for the purpose of fitting a GMM to it?

I have been using Gaussian Mixture Models (GMM) to model a set of peaks in a 2D numpy array (a).
a = np.array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 100., 1000., 100., 2., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 100., 100., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 2., 1., 2., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
The problem is that in order to fit a GMM to my data with sklearn I have to first generate a density_array, which holds a huge amount of data points depending on the height of the peaks in a.
def convert_to_density_array(array):
"""
Convert an array to a density array
"""
density_list = []
# iterate over each i,j coordinate in the array
for (i, j), value in np.ndenumerate(array):
for x in range(int(value)):
density_list.append((i, j))
return np.array(density_list)
density_array = convert_to_density_array(a)
gmm = mixture.GaussianMixture(n_components=2,covariance_type='full').fit(density_array)
Is there an efficient way of representing a 2D numpy array for the purpose of fitting a GMM to it?
you can store data using less precision by adding dtype=np.float32 to your np.array call, which is okay as long as you are fine with 8 digits of precision instead of 15 (which is totally acceptable in your case), but that's the only way to store the same data in memory in less footprint and still pass it to gmm.
what you are trying to do is curve fitting, not data modelling , so you can use scipy curve fit on your original data without making density_array to start with, you just have to pass it a function of two gaussians and in a loop change the initial estimate randomly until you get the least error, but as writing the code for it will take some time, consider this approach only if you cannot get your data in memory using any other method.

Concatenate arrays inside of an array - Value Error: Zero dimension arrays cannot be concatenated

I'm trying to append to an existing dataframe 3 new columns which should encode dummy variables.
To do this I am creating a function to look through the array to be "dummied", and for each 'hit' to assign the corresponding value to the to a new row.
import numpy as np
import pandas as pd
iriss = np.concatenate((np.array(['setosa']*50), np.array(['versicolor']*50), np.array(['virginica']*50)), axis = 0)
In this case I present the Iris data set's species column, with 150 equally distributed species (50 units per species).
def one_hot_coding():
one_hot_column = np.array([], dtype = 'int8')
for one_hot in iriss:
#my idea here is to find the 'hit = species' and to then for each 'hit' to assign to these
# three different np.arrays the value of one or zero
if one_hot == 'setosa':
one_hot_setosa = np.append(one_hot_column, 1)
one_hot_versicolor = np.append(one_hot_column, 0)
one_hot_virginica = np.append(one_hot_column, 0)
elif one_hot == 'versicolor':
one_hot_setosa = np.append(one_hot_column, 0)
one_hot_versicolor = np.append(one_hot_column, 1)
one_hot_virginica = np.append(one_hot_column, 0)
else:
one_hot_setosa = np.append(one_hot_column, 0)
one_hot_versicolor = np.append(one_hot_column, 0)
one_hot_virginica = np.append(one_hot_column, 1)
one_hot_setosa = np.concatenate((one_hot_setosa), axis = 0)
print(one_hot_setosa)
one_hot_coding()
Results Discussion:
to make it easier I will only talk about one_hot_setosa:
when I call print on one_hot_setosa 150 lines appear where the first 50 lines are [1]'s and the latter 100 [0]'s.
[1] [1] ...48 [0] [0] ...48 [0] [0] ... 48
From what I see here my results are 150 independent arrays inside of the array called one_hot_setosa.
When I try to concatenate them all to obtain a single array (i.e. the iriss array created to house the 150 units) I get the following error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-107-6f70367ed6eb> in <module>
24
25 print(one_hot_setosa)
---> 26 one_hot_coding()
<ipython-input-107-6f70367ed6eb> in one_hot_coding()
21 one_hot_virginica = np.append(one_hot_column, 1)
22
---> 23 one_hot_setosa = np.concatenate((one_hot_setosa), axis = 0)
24
25 print(one_hot_setosa)
<__array_function__ internals> in concatenate(*args, **kwargs)
ValueError: zero-dimensional arrays cannot be concatenated
So this error is telling me that I don't actually have 150 arrays, or better said an array.shape = (150, 1) (which is what I want). but that actually my array contains nothing? Why is this?
Okay, so I've been fighting a lot but I managed to solve the problem.
np.concatenate will add rows to the array. So I specified I wanted to add rows to the one_hot_flower. Also, as #hpaulj pointed out, square brackets are very important to consider. Since my arrays are 1d arrays I needed to add only square brackets once to the value I wanted to concatenate.
All in all the final code looks like this:
import numpy as np
import pandas as pd
iriss = np.concatenate((np.array(['setosa']*50), np.array(['versicolor']*50), np.array(['virginica']*50)), axis = 0)
one_hot_setosa = np.array([])
one_hot_versicolor = np.array([])
one_hot_virginica = np.array([])
def one_hot_coding():
global one_hot_setosa
global one_hot_versicolor
global one_hot_virginica
for one_hot in iriss:
if one_hot == 'setosa':
one_hot_setosa = np.concatenate((one_hot_setosa, np.array([1])), axis = 0)
one_hot_versicolor = np.concatenate((one_hot_versicolor, np.array([0])), axis = 0)
one_hot_virginica = np.concatenate((one_hot_virginica, np.array([0])), axis = 0)
elif one_hot == 'versicolor':
one_hot_setosa = np.concatenate((one_hot_setosa, np.array([0])), axis = 0)
one_hot_versicolor = np.concatenate((one_hot_versicolor, np.array([1])), axis = 0)
one_hot_virginica = np.concatenate((one_hot_virginica, np.array([0])), axis = 0)
else:
one_hot_setosa = np.concatenate((one_hot_setosa, np.array([0])), axis = 0)
one_hot_versicolor = np.concatenate((one_hot_versicolor, np.array([0])), axis = 0)
one_hot_virginica = np.concatenate((one_hot_virginica, np.array([1])), axis = 0)
#one_hot_setosa = np.concatenate((one_hot_setosa), axis = 0)
return one_hot_setosa, one_hot_versicolor, one_hot_virginica
one_hot_coding()
[Out]:
(array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]))

How to generate a matrix with circle of ones in numpy/scipy

There are some signal generation helper functions in python's scipy, but these are only for 1 dimensional signal.
I want to generate a 2-D ideal bandpass filter, which is a matrix of all zeros, with a circle of ones to remove some periodic noise from my image.
I am now doing with:
def unit_circle(r):
def distance(x1, y1, x2, y2):
return math.sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2)
d = 2*r + 1
mat = np.zeros((d, d))
rx , ry = d/2, d/2
for row in range(d):
for col in range(d):
dist = distance(rx, ry, row, col)
if abs(dist - r) < 0.5:
mat[row, col] = 1
return mat
result:
In [18]: unit_circle(6)
Out[18]:
array([[ 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0.],
[ 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
[ 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0.],
[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0.],
[ 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0.]])
Is there a more direct way to generate a matrix of circle of ones, all else zeros?
Edit:
Python 2.7.12
Here's a vectorized approach -
def unit_circle_vectorized(r):
A = np.arange(-r,r+1)**2
dists = np.sqrt(A[:,None] + A)
return (np.abs(dists-r)<0.5).astype(int)
Runtime test -
In [165]: %timeit unit_circle(100) # Original soln
10 loops, best of 3: 31.1 ms per loop
In [166]: %timeit my_unit_circle(100) ##Eli Korvigo's soln
100 loops, best of 3: 2.68 ms per loop
In [167]: %timeit unit_circle_vectorized(100)
1000 loops, best of 3: 582 µs per loop
Here is a pure NumPy alternative that should run significantly faster and looks cleaner, imho. Basically, we vectorise your code by replacing built-in sqrt and abs with their NumPy alternatives and working on matrices of indices.
Updated to replace distance with np.hypot(courtesy of James K)
In [5]: import numpy as np
In [6]: def my_unit_circle(r):
...: d = 2*r + 1
...: rx, ry = d/2, d/2
...: x, y = np.indices((d, d))
...: return (np.abs(np.hypot(rx - x, ry - y)-r) < 0.5).astype(int)
...:
In [7]: my_unit_circle(6)
Out[7]:
array([[ 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0.],
[ 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
[ 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0.],
[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0.],
[ 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0.]])
Benchmarks
In [12]: %timeit unit_circle(100)
100 loops, best of 3: 17.7 ms per loop
In [13]: %timeit my_unit_circle(100)
1000 loops, best of 3: 480 µs per loop
result of code execution
def gen_circle(img: np.ndarray, center: tuple, diameter: int) -> np.ndarray:
"""
Creates a matrix of ones filling a circle.
"""
# gets the radious of the image
radious = diameter//2
# gets the row and column center of the image
row, col = center
# generates theta vector to variate the angle
theta = np.arange(0, 360)*(np.pi/180)
# generates the indexes of the column
y = (radious*np.sin(theta)).astype("int32")
# generates the indexes of the rows
x = (radious*np.cos(theta)).astype("int32")
# with:
# img[x, y] = 1
# you can draw the border of the circle
# instead of the inner part and the border.
# centers the circle at the input center
rows = x + (row)
cols = y + (col)
# gets the number of rows and columns to make
# to cut by half the execution
nrows = rows.shape[0]
ncols = cols.shape[0]
# makes a copy of the image
img_copy = copy.deepcopy(img)
# We use the simetry in our favour
# does reflection on the horizontal axes
# and in the vertical axes
for row_down, row_up, col1, col2 in zip(rows[:nrows//4],
np.flip(rows[nrows//4:nrows//2]),
cols[:ncols//4],
cols[nrows//2:3*ncols//4]):
img_copy[row_up:row_down, col2:col1] = 1
return img_copy
center = (30,40)
ones = np.zeros((center[0]*2, center[1]*2))
diameter = 30
circle = gen_circle(ones, center, diameter)
plt.imshow(circle)

Sparse Construct: Repeating Identity

say I have with ij being large (e.g. 5000) , the two following matrices
E = np.identity((ij))
oneVector = np.ones((1, ij))
and I need to compute
np.kron(E, oneVector)
This is quite slow and inefficient. Basically, the Kronecker product of identity and a row vector of ones is repeating the identity matrix horizontally oneVector.size times.
I believe that creating a sparse product would make more sense. scipy.sparse.kron would allow me to create that product if I had both A, B as sparse. But I don't know how to create the vector of ones as a "sparse type" matrix.
Is there a simple way to generate the sparse equivalent of np.ones() or is there another way I should proceed?
The arguments to scipy.sparse.kron do not have to be sparse.
In [31]: import numpy as np
In [32]: import scipy.sparse as sp
In [33]: ij = 4
In [34]: E = sp.identity(ij) # Sparse identity matrix
In [35]: oneVector = np.ones((1, ij)) # Dense
In [36]: m = sp.kron(E, oneVector) # m is sparse.
In [37]: m
Out[37]:
<4x16 sparse matrix of type '<type 'numpy.float64'>'
with 16 stored elements (blocksize = 1x4) in Block Sparse Row format>
In [38]: m.A
Out[38]:
array([[ 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1.]])
P.S. Based on this comment:
Basically, the Kronecker product of identity and a row vector of ones is repeating the identity matrix horizontally oneVector.size times.
I wonder if you meant kron(oneVector, E):
In [39]: m = sp.kron(oneVector, E)
In [40]: m.A
Out[40]:
array([[ 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 1.]])

Categories

Resources