Related
I have been using Gaussian Mixture Models (GMM) to model a set of peaks in a 2D numpy array (a).
a = np.array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 100., 1000., 100., 2., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 100., 100., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 2., 1., 2., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
The problem is that in order to fit a GMM to my data with sklearn I have to first generate a density_array, which holds a huge amount of data points depending on the height of the peaks in a.
def convert_to_density_array(array):
"""
Convert an array to a density array
"""
density_list = []
# iterate over each i,j coordinate in the array
for (i, j), value in np.ndenumerate(array):
for x in range(int(value)):
density_list.append((i, j))
return np.array(density_list)
density_array = convert_to_density_array(a)
gmm = mixture.GaussianMixture(n_components=2,covariance_type='full').fit(density_array)
Is there an efficient way of representing a 2D numpy array for the purpose of fitting a GMM to it?
you can store data using less precision by adding dtype=np.float32 to your np.array call, which is okay as long as you are fine with 8 digits of precision instead of 15 (which is totally acceptable in your case), but that's the only way to store the same data in memory in less footprint and still pass it to gmm.
what you are trying to do is curve fitting, not data modelling , so you can use scipy curve fit on your original data without making density_array to start with, you just have to pass it a function of two gaussians and in a loop change the initial estimate randomly until you get the least error, but as writing the code for it will take some time, consider this approach only if you cannot get your data in memory using any other method.
Using DataArray objects in xarray what is the best way to find all cells that have values != 0.
For example in pandas I would do
df.loc[df.col1 > 0]
My specific example I'm trying to look at 3 dimensional brain imaging data.
first_image_xarray.shape
(140, 140, 96)
dims = ['x','y','z']
Looking at the documentation for xarray.DataArray.where it seems I want something like this:
first_image_xarray.where(first_image_xarray.y + first_image_xarray.x > 0,drop = True)[:,0,0]
But I still get arrays with zeros.
<xarray.DataArray (x: 140)>
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -0., 0., -0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Dimensions without coordinates: x
Also - a side question - why are there some negative zeros? Are these values rounded and -0. is actually equal to something like -0.009876 or something?
(Answer to main question)
You are almost there. However, a slight syntax difference makes a big difference here. On one hand, here is the solution to filter >0 values using a "value-based" mask.
# if you want to DROP values which do not suffice a mask condition
first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, drop=True)
or
# if you want to KEEP values which do not suffice a mask condition as nan
first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, np.nan)
On the other hand, the reason why your attempt did not work as you hoped is because with first_image_xarray.x, it is referring to the index of elements in the array (in x direction) rather than referring to the value of the elements. Thus only the 1st element of your output should be nan instead of 0 because it only does not suffice the mask condition in slice [:,0,0]. Yes, you were creating an "index-based" mask.
The following small experiment (hopefully) articulates this critical difference.
Suppose we have DataArray which consists of only 0 and 1 (dimension is aligned with the original post (OP) of the question (140,140,96)). Firstly let's mask it based on index as OP did:
import numpy as np
import xarray as xr
np.random.seed(0)
# create a DataArray which randomly contains 0 or 1 values
a = xr.DataArray(np.random.randint(0, 2, 140*140*96).reshape((140, 140, 96)), dims=('x', 'y', 'z'))
# with this "index-based" mask, only elements where index of both x and y are 0 are replaced by nan
a.where(a.x + a.y > 0, drop=True)[:,0,0]
Out:
<xarray.DataArray (x: 140)>
array([ nan, 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0.,
0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0.,
1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1.,
1., 0., 0., 0., 1., 1., 1., 0., 0., 1., 0., 0.,
1., 0., 1., 1., 0., 0., 1., 0., 0., 1., 1., 1.,
0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 0., 0.,
0., 0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0.,
0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0.,
0., 0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 1.,
0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.,
0., 1., 0., 0., 1., 0., 0., 1.])
Dimensions without coordinates: x
With the mask above, only the element where index of both x and y are 0 turns in to nan and the rest has not been changed or dropped at all.
In contrast, the proposed solution masks the DataArray based on the values of DataArray elements.
# with this "value-based" mask, all the values which do not suffice the mask condition are dropped
a[:,0,0].where(a[:,0,0] > 0, drop=True)
Out:
<xarray.DataArray (x: 65)>
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1.])
Dimensions without coordinates: x
This successfully dropped all the values which do not suffice a mask condition based on the values of DataArray elements.
(Answer to side question)
As for the origin of -0 and 0 in DataArray, rounded values from negative or positive side towards 0 would be the possibility: A related discussion was done here How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy? The below is a tiny example of this case.
import numpy as np
import xarray as xr
xr_array = xr.DataArray([-0.1, 0.1])
# you can use either xr.DataArray.round() or np.round() for rounding values of DataArray
xr.DataArray.round(xr_array)
Out:
<xarray.DataArray (dim_0: 2)>
array([-0., 0.])
Dimensions without coordinates: dim_0
np.round(xr_array)
Out:
<xarray.DataArray (dim_0: 2)>
array([-0., 0.])
Dimensions without coordinates: dim_0
As a side note, the other possibility for getting -0 in NumPy array can be numpy.set_printoptions(precision=0), which hides below decimal point like below (but I know this is not the case this time since you are using DataArray):
import numpy as np
# default value is precision=8 in ver1.15
np.set_printoptions(precision=0)
np.array([-0.1, 0.1])
Out:
array([-0., 0.])
Anyway, My best guess is that the conversion to -0 should be manual and intentional rather than automatic in data preparation & pre-processing phase.
Hope this helps.
There are some signal generation helper functions in python's scipy, but these are only for 1 dimensional signal.
I want to generate a 2-D ideal bandpass filter, which is a matrix of all zeros, with a circle of ones to remove some periodic noise from my image.
I am now doing with:
def unit_circle(r):
def distance(x1, y1, x2, y2):
return math.sqrt((x1 - x2) ** 2 + (y1 - y2) ** 2)
d = 2*r + 1
mat = np.zeros((d, d))
rx , ry = d/2, d/2
for row in range(d):
for col in range(d):
dist = distance(rx, ry, row, col)
if abs(dist - r) < 0.5:
mat[row, col] = 1
return mat
result:
In [18]: unit_circle(6)
Out[18]:
array([[ 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0.],
[ 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
[ 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0.],
[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0.],
[ 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0.]])
Is there a more direct way to generate a matrix of circle of ones, all else zeros?
Edit:
Python 2.7.12
Here's a vectorized approach -
def unit_circle_vectorized(r):
A = np.arange(-r,r+1)**2
dists = np.sqrt(A[:,None] + A)
return (np.abs(dists-r)<0.5).astype(int)
Runtime test -
In [165]: %timeit unit_circle(100) # Original soln
10 loops, best of 3: 31.1 ms per loop
In [166]: %timeit my_unit_circle(100) ##Eli Korvigo's soln
100 loops, best of 3: 2.68 ms per loop
In [167]: %timeit unit_circle_vectorized(100)
1000 loops, best of 3: 582 µs per loop
Here is a pure NumPy alternative that should run significantly faster and looks cleaner, imho. Basically, we vectorise your code by replacing built-in sqrt and abs with their NumPy alternatives and working on matrices of indices.
Updated to replace distance with np.hypot(courtesy of James K)
In [5]: import numpy as np
In [6]: def my_unit_circle(r):
...: d = 2*r + 1
...: rx, ry = d/2, d/2
...: x, y = np.indices((d, d))
...: return (np.abs(np.hypot(rx - x, ry - y)-r) < 0.5).astype(int)
...:
In [7]: my_unit_circle(6)
Out[7]:
array([[ 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0.],
[ 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
[ 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0.],
[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0.],
[ 0., 0., 1., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 1., 0., 0., 0., 0.]])
Benchmarks
In [12]: %timeit unit_circle(100)
100 loops, best of 3: 17.7 ms per loop
In [13]: %timeit my_unit_circle(100)
1000 loops, best of 3: 480 µs per loop
result of code execution
def gen_circle(img: np.ndarray, center: tuple, diameter: int) -> np.ndarray:
"""
Creates a matrix of ones filling a circle.
"""
# gets the radious of the image
radious = diameter//2
# gets the row and column center of the image
row, col = center
# generates theta vector to variate the angle
theta = np.arange(0, 360)*(np.pi/180)
# generates the indexes of the column
y = (radious*np.sin(theta)).astype("int32")
# generates the indexes of the rows
x = (radious*np.cos(theta)).astype("int32")
# with:
# img[x, y] = 1
# you can draw the border of the circle
# instead of the inner part and the border.
# centers the circle at the input center
rows = x + (row)
cols = y + (col)
# gets the number of rows and columns to make
# to cut by half the execution
nrows = rows.shape[0]
ncols = cols.shape[0]
# makes a copy of the image
img_copy = copy.deepcopy(img)
# We use the simetry in our favour
# does reflection on the horizontal axes
# and in the vertical axes
for row_down, row_up, col1, col2 in zip(rows[:nrows//4],
np.flip(rows[nrows//4:nrows//2]),
cols[:ncols//4],
cols[nrows//2:3*ncols//4]):
img_copy[row_up:row_down, col2:col1] = 1
return img_copy
center = (30,40)
ones = np.zeros((center[0]*2, center[1]*2))
diameter = 30
circle = gen_circle(ones, center, diameter)
plt.imshow(circle)
I have outputs from a process that produce a data trend as seen below:
The data output seems to have a trend with the diagonals, however I am unsure on how I can track this. Ultimately, I know the first 15 numbers in each 16 number sample, and want to predict the 16th. It seems like you should be able to do this with some type of approximation that involves matrix math or possible phase shift in a Fourier series. Is there a method that could achieve this? If there is a solution that can be used via Python that would be preferred.
you can use my diagonal detection matrix, it was developed for a similar issue, some times, it is referred to by Omran Matrix. All you need, is to multiply the image (your matrix) with my matrix, and summate the first row of the output, which will give you the number of diagonals in the image. The matrix is also very flexible and can be a vertical rectangular matrix, I used some tricks in the physical meaning to inverse it. I developed it in 2010 in Zurich, while doing my PhD to detect diagonal lines or overtones in sweeps in visual sound images. the matrix is published in Detecting diagonal activity to quantify harmonic structure preservation with cochlear implant mapping or formal link. The PhD thesis is called, mechanism of music perception using cochlear implants, University of Zurich, 2011 by Sherif Omran. If you write a paper, please cite me and good luck
here are similar images with overtones, I used my matrix to detect these diagonal activities, which look very near to yours.
Here is an example of how to check whether opposite diagonals contain only 1s, like in your case:
In [52]: from scipy.sparse import eye
let's create a matrix with a opposite diagonal
In [53]: a = np.fliplr(eye(5, 8, k=1).toarray())
In [54]: a
Out[54]:
array([[ 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0.]])
Flip array in the left/right direction
In [55]: f = np.fliplr(a)
In [56]: f
Out[56]:
array([[ 0., 1., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0.]])
the same can be done:
In [71]: a[::-1,:]
Out[71]:
array([[ 0., 0., 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 1., 0.]])
get given diagonal
In [57]: np.diag(f, k=1)
Out[57]: array([ 1., 1., 1., 1., 1.])
In [58]: np.diag(f, k=-1)
Out[58]: array([ 0., 0., 0., 0.])
In [111]: a[::-1].diagonal(2)
Out[111]: array([ 1., 1., 1., 1., 1.])
check whether the whole diagonal contains 1s
In [61]: np.all(np.diag(f, k=1) == 1)
Out[61]: True
or
In [64]: (np.diag(f, k=1) == 1).all()
Out[64]: True
In [65]: (np.diag(f, k=0) == 1).all()
Out[65]: False
This answer will help you to find all diagonals
PS i'm a newbie in numpy, so i'm pretty sure there must be faster and more elegant solutions
I am trying to use reduce() function to create a function hstack() which horizontally stacks multiple arrays. As a simple example, lets say
>>>>M=eye((4))
>>>>M
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
>>>>hstack([M,M])
array([[ 1., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 1.]])
This works as I want. Now I define
>>>> hstackm = lambda *args: reduce(hstack, args)
And try to do the hstack() from the previous case
>>>>hstackm([M,M])
[array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]]),
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])]
Which is incorrect. How do I define hstackm() to obtain a proper output?
My final objective will be to create a hstackm() function to stack SPARSE matrices if it is possible. Something like,
hstackm = lambda *args: reduce(sparse.hstack, args).
The _*args_ would be csr or _lil_matrix_
thank you
In [16]: hstackm = lambda args: reduce(lambda x,y:hstack((x,y)), args)
In [17]: hstackm([M,M])
Out[17]:
array([[ 1., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 1., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 1.]])
Your function hstack takes one parameter, a list of matrices. reduce() calls it with two parameters instead, each a matrix.
Change your hstack method to accept an arbitrary number of arguments instead:
def hstack(*matrices):
....
instead of hstack(matrices), then call it as hstack(M, M).