Outer minimum vectorization in numpy follow up - python

This is a follow-up to my previous question.
Given an NxM matrix A, I want to efficiently obtain the NxN matrix whose ith row is the sum along the 2nd axis of the result of applying np.minimum between A and the ith row of A.
Using a for loop,
> A = np.array([[1, 2], [3, 4], [5,6]])
> output = np.zeros(shape=(A.shape[0], A.shape[0]))
> for i in range(A.shape[0]):
output[i] = np.sum(np.minimum(A, A[i]), axis=1)
> output
array([[ 3., 3., 3.],
[ 3., 7., 7.],
[ 3., 7., 11.]])
Is is possible to optimize this further without the for loop?
Edit: I would also like to do it without allocating an MxMxN tensor because of memory constraints.

instead of a for loop. Using the NumPy minimum and sum functions, you can compute the desired matrix output as follows:
output = np.sum(np.minimum(A[:, None], A), axis=2)

Related

numpy slices using column dependent end index from an integer array

If I have an array and I apply summation
arr = np.array([[1.,1.,2.],[2.,3.,4.],[4.,5.,6]])
np.sum(arr,axis=1)
I get the total along the three rows ([4.,9.,15.])
My complication is that arr contains data that may be bad after a certain column index. I have an integer array that tells me how many "good" values I have in each row and I want to sum/average over the good values. Say:
ngoodcols=np.array([0,1,2])
np.sum(arr[:,0:ngoodcols],axis=1) # not legit but this is the idea
It is clear how to do this in a loop, but is there a way to sum only that many, producing [0.,2.,9.] without resorting to looping? Equivalently, I could use nansum if I knew how to set the elements in column indexes higher than b equal to np.nan, but this is a nearly equivalent problem as far as slicing is concerned.
One possibility is to use masked arrays:
import numpy as np
arr = np.array([[1., 1., 2.], [2., 3., 4.], [4., 5., 6]])
ngoodcols = np.array([0, 1, 2])
mask = ngoodcols[:, np.newaxis] <= np.arange(arr.shape[1])
arr_masked = np.ma.masked_array(arr, mask)
print(arr_masked)
# [[-- -- --]
# [2.0 -- --]
# [4.0 5.0 --]]
print(arr_masked.sum(1))
# [-- 2.0 9.0]
Note that here when there are not good values you get a "missing" value as a result, which may or may not be useful for you. Also, a masked array also allows you to easily do other operations that only apply for valid values (mean, etc.).
Another simple option is to just multiply by the mask:
import numpy as np
arr = np.array([[1., 1., 2.], [2., 3., 4.], [4., 5., 6]])
ngoodcols = np.array([0, 1, 2])
mask = ngoodcols[:, np.newaxis] <= np.arange(arr.shape[1])
print((arr * ~mask).sum(1))
# [0. 2. 9.]
Here when there are no good values you just get zero.
Here is one way using Boolean indexing. This sets elements in column indexes higher than ones in ngoodcols equal to np.nan and use np.nansum:
import numpy as np
arr = np.array([[1.,1.,2.],[2.,3.,4.],[4.,5.,6]])
ngoodcols = np.array([0,1,2])
arr[np.asarray(ngoodcols)[:,None] <= np.arange(arr.shape[1])] = np.nan
print(np.nansum(arr, axis=1))
# [ 0. 2. 9.]

Python : Mapping values to other values without gap

I have the following question. Is there somekind of method with numpy or scipy , which I can use to get an given unsorted array like this
a = np.array([0,0,1,1,4,4,4,4,5,1891,7]) #could be any number here
to something where the numbers are interpolated/mapped , there is no gap between the values and they are in the same order like before?:
[0,0,1,1,2,2,2,2,3,5,4]
EDIT
Is it furthermore possible to swap/shuffle the numbers after the mapping, so that
[0,0,1,1,2,2,2,2,3,5,4]
become something like:
[0,0,3,3,5,5,5,5,4,1,2]
Edit: I'm not sure what the etiquette is here (should this be a separate answer?), but this is actually directly obtainable from np.unique.
>>> u, indices = np.unique(a, return_inverse=True)
>>> indices
array([0, 0, 1, 1, 2, 2, 2, 2, 3, 5, 4])
Original answer: This isn't too hard to do in plain python by building a dictionary of what index each value of the array would map to:
x = np.sort(np.unique(a))
index_dict = {j: i for i, j in enumerate(x)}
[index_dict[i] for i in a]
Seems you need to rank (dense) your array, in which case use scipy.stats.rankdata:
from scipy.stats import rankdata
rankdata(a, 'dense')-1
# array([ 0., 0., 1., 1., 2., 2., 2., 2., 3., 5., 4.])

Combine numpy arrays to form a matrix

This seems like it should be straightforward, but I can't figure it out.
Data source is a two column, comma delimited input file with these contents:
6,10
5,9
8,13
...
And my code is:
import numpy as np
data = np.loadtxt("data.txt", delimiter=",")
m = len(data)
x = np.reshape(data[:,0], (m,1))
y = np.ones((m,1))
z = np.matrix([x,y])
Which gives me this error:
Users/acpigeon/.virtualenvs/ipynb/lib/python2.7/site-packages/numpy-1.9.0.dev_297f54b-py2.7-macosx-10.9-intel.egg/numpy/matrixlib/defmatrix.pyc in __new__(subtype, data, dtype, copy)
270 shape = arr.shape
271 if (ndim > 2):
--> 272 raise ValueError("matrix must be 2-dimensional")
273 elif ndim == 0:
274 shape = (1, 1)
ValueError: matrix must be 2-dimensional
No amount of reshaping seems to get this to work, so I'm either missing something really simple or there's a better way to do this.
EDIT:
Would have been helpful to specify the output I am looking for. Here is a line of code that generates the desired result:
In [1]: np.matrix([[5,1],[6,1],[8,1]])
Out[1]:
matrix([[5, 1],
[6, 1],
[8, 1]])
The desired output can be generated this way:
In [12]: np.array((data[:, 0], np.ones(m))).transpose()
Out[12]:
array([[ 6., 1.],
[ 5., 1.],
[ 8., 1.]])
The above is copied from ipython and so has ipython style prompts.
Answer to previous version
To eliminate the error, replace:
x = np.reshape(data[:, 0], (m, 1))
with:
x = data[:, 0]
The former line produces a 2-dimensional matrix and that is what causes the error message. The latter produces a 1-D array with the same data.
Or how about first turning the array into a matrix, and then change the last column to 1?
In [2]: data=np.loadtxt('stack23859379.txt',delimiter=',')
In [3]: np.matrix(data)
Out[3]:
matrix([[ 6., 10.],
[ 5., 9.],
[ 8., 13.]])
In [4]: z = np.matrix(data)
In [5]: z[:,1]=1
In [6]: z
Out[6]:
matrix([[ 6., 1.],
[ 5., 1.],
[ 8., 1.]])

Correlation matrix in NumPy with NaN's

A have a n x m matrix in which row i represents the timeseries of the variable V_i. I would like to compute the n x n correlation matrix M, where M_{i,j} contains the correlation coefficient (Pearson's r) between V_i and V_j.
However, when I try the following in numpy:
numpy.corrcoef(numpy.matrix('5 6 7; 1 1 1'))
I get the following output:
array([[ 1., nan],
[ nan, nan]])
It seems that numpy.corrcoef doesn't like unit vectors, because if I change the second row to 7 6 5, I get the expected result:
array([[ 1., -1.],
[ -1., 1.]])
What is the reason for this kind of behavior of numpy.corrcoef?
leewangzhong (in the comment) is correct, Pearson's r is not defined for constant timeseries, as their standard deviation is zero. Thanks!

Finding minimum value element-wise in three 2d submatrices

I'm trying to produce a color mapping of the convergence of a polynomial's roots in complex space. In order to do this, I have created a grid of points and applied Newton's method to those points, in order to find to which complex root they each converge. This gives me a 2d array of complex numbers, the elements of which denote the point to which they converge, within some tolerance. I want to be able to match the numbers in that matrix to an element-wise color mapping.
I have done this by iterating over the array and computing colors element-by-element, but it is very slow, and seems it would benefit from vectorizing. Here's my code so far:
def colorvec(rootsmatrix, known_roots):
dim = len(known_roots)
dist = ndarray((dim, nx, ny))
for i in range(len(known_roots)):
dist[i] = abs(rootsmatrix-known_roots[i])
This creates a 3d array with the distances of each point's computed root to each of the actual roots. It looks something like this, except with 75 000 000 elements.
[ [ [6e-15 7e-15 0.5]
[1.5 5e-15 0.5] #submatrix 1
[0.75 0.98 0.78] ]
[ [1.5 0.75 0.5]
[8e-15 5e-15 0.8] #submatrix 2
[0.75 0.98 0.78] ]
[ [1.25 0.5 5e-15]
[0.5 0.64 4e-15] #submatrix 3
[5e-15 4e-15 7e-15] ]
I want to take dist, and return the 1st dimension argument (i.e., 1, 2, or 3) for each 2nd- and 3rd-dimension argument, for which dist is minimum. That will be my color mapping. For example, comparing the element (0,0) of each of the 3 submatrices would yield that color(0,0) = 0. Similarly, color(1,1) = 0 and color (2,2) = 2. I want to be able to do this for the entire color matrix.
I haven't been able to find a way to do this using numpy.argmin, but I could be missing something. If there's another way to do this, I'd be happy to hear, especially if it doesn't involve loops. I'm making ~25MP images here, so looping takes fully 25 minutes to assign colors.
Thanks in advance for your advice!
You can pass an axis argument to argmin. You want to minimize along the first axis (what you're calling 'submatrices'), which is axis=0:
dist.argmin(0)
dist = array([[[ 6.00e-15, 7.00e-15, 5.00e-01],
[ 1.50e+00, 5.00e-15, 5.00e-01],
[ 7.50e-01, 9.80e-01, 7.80e-01]],
[[ 1.50e+00, 7.50e-01, 5.00e-01],
[ 8.00e-15, 5.00e-15, 8.00e-01],
[ 7.50e-01, 9.80e-01, 7.80e-01]],
[[ 1.25e+00, 5.00e-01, 5.00e-15],
[ 5.00e-01, 6.40e-01, 4.00e-15],
[ 5.00e-15, 4.00e-15, 7.00e-15]]])
dist.argmin(0)
#array([[0, 0, 2],
# [1, 0, 2],
# [2, 2, 2]])
This of course gives you 0, 1, 2 as the returns, if you want 1, 2, 3 as stated, use:
dist.argmin(0) + 1
#array([[1, 1, 3],
# [2, 1, 3],
# [3, 3, 3]])
Finally, if you actually want the minimum value itself (instead of which 'submatrix' it comes from), you can just use dist.min(0):
dist.min(0)
#array([[ 6.00e-15, 7.00e-15, 5.00e-15],
# [ 8.00e-15, 5.00e-15, 4.00e-15],
# [ 5.00e-15, 4.00e-15, 7.00e-15]])
If you want to use the minimum location from the dist matrix to pull a value out of another matrix, it's a little tricky, but you can use
minloc = dist.argmin(0)
other[dist.argmin(0), np.arange(dist.shape[1])[:, None], np.arange(dist.shape[2])]
Note that if other=dist this gives the same output as just calling dist.min(0):
dist[dist.argmin(0), np.arange(dist.shape[1])[:, None], np.arange(dist.shape[2])]
#array([[ 6.00e-15, 7.00e-15, 5.00e-15],
# [ 8.00e-15, 5.00e-15, 4.00e-15],
# [ 5.00e-15, 4.00e-15, 7.00e-15]])
or if other just says which submatrix it is, you get the same thing back:
other = np.ones((3,3,3))*np.arange(1,4).reshape(3,1,1)
other
#array([[[ 1., 1., 1.],
# [ 1., 1., 1.],
# [ 1., 1., 1.]],
# [[ 2., 2., 2.],
# [ 2., 2., 2.],
# [ 2., 2., 2.]],
# [[ 3., 3., 3.],
# [ 3., 3., 3.],
# [ 3., 3., 3.]]])
other[dist.argmin(0), np.arange(dist.shape[1])[:, None], np.arange(dist.shape[2])]
#array([[ 1., 1., 3.],
# [ 2., 1., 3.],
# [ 3., 3., 3.]])
As an unrelated note, you can rewrite colorvec without that loop, assuming rootsmatrix.shape is (nx, ny) and known_roots.shape is (dim,)
def colorvec(rootsmatrix, known_roots):
dist = abs(rootsmatrix - known_roots[:, None, None])
where known_roots[:, None, None] is the same as known_roots.reshape(len(known_roots), 1, 1) and causes it to broadcast with rootsmatrix

Categories

Resources