Variable Partial Array Summation in Python - python

I'm looking for a solution to sum per column in a 2D array ("a" in the example below) and starting from a cell position as defined in a different 1D array ("ref" in the example below).
I have tried the following:
import numpy as np
a = np.arange(20).reshape(5, 4)
print(a) # representing an original large 2D array
ref = np.array([0, 2, 4, 1]) # reference array for defining start of sum
s = a.sum(axis=0)
print(s) # Works: sums all elements per column
s = a[2:].sum(axis=0)
print(s) # Works as well: sum from the third element till end per column
# This is what I look for: sum per column starting at element defined by ref[]
s = np.zeros(4).astype(int) # makes an empty 1D array
for i in np.arange(4): # for each column
for j in np.arange(ref[i], 5):
s[i] += a[j, i] # sums all elements from ref till end (i.e. 5)
print(s) # This is the desired outcome
for i in np.arange(4):
s = a[ref[i]:].sum(axis=0)
print(s) # No good; same as a[ref[4]:].sum(axis=0) and here ref[4] = 1
s = np.zeros(4).astype(int) # makes an empty 1D array
for i in np.arange(4):
s[i] = np.sum(a[ref[i]:, i])
print(s) # Yes; this is also the desired outcome
Is it possible to realize this without using a for loop?
Does numpy have functions for doing this in a single step?
s = a[ref:].sum(axis=0)
This would be nice, but is not working.
Thank you for your time!

A basic solution based on np.cumsum:
In [1]: a = np.arange(15).reshape(5, 3)
In [2]: res = np.array([0, 2, 3])
In [3]: b = np.cumsum(a, axis=0)
In [4]: b
Out[4]:
array([[ 0, 1, 2],
[ 3, 5, 7],
[ 9, 12, 15],
[18, 22, 26],
[30, 35, 40]])
In [5]: a
Out[5]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [6]: b[res, np.arange(a.shape[1])]
Out[6]: array([ 0, 12, 26])
In [7]: b[-1, :] - b[res, np.arange(a.shape[1])]
Out[7]: array([30, 23, 14])
so it does not give us the result we want: we need to add a first line of zeros to b:
In [13]: b = np.vstack([np.zeros((1, a.shape[1])), b])
In [14]: b
Out[14]:
array([[ 0., 0., 0.],
[ 0., 1., 2.],
[ 3., 5., 7.],
[ 9., 12., 15.],
[ 18., 22., 26.],
[ 30., 35., 40.]])
In [17]: b[-1, :] - b[res, np.arange(a.shape[1])]
Out[17]: array([ 30., 30., 25.])
which is, I believe, the desired output.

Related

Extract a block from an 2d array

Suppose you have a 2D array filled with integers in a continuous manner, going from left to right and top to bottom. Hence it would look like
[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]]
Suppose now you have a 1D array of some of the integers shown in the array above. Lets say this array is [6,7,11]. I want to extract the block/chunk of the 2D array that contains the elements of the list. With these two inputs the result should be
[[ 6., 7.],
[11., nan]]
(I am padding with np.nan is it cannot be reshaped)
This is what I have written. Is there a simpler way please?
import numpy as np
def my_fun(my_list):
ids_down = 4
ids_across = 5
layout = np.arange(ids_down * ids_across).reshape((ids_down, ids_across))
ids = np.where((layout >= min(my_list)) & (layout <= max(my_list)), layout, np.nan)
r,c = np.unravel_index(my_list, ids.shape)
out = np.nan*np.ones(ids.shape)
for i, t in enumerate(zip(r,c)):
out[t] = my_list[i]
ax1_mask = np.any(~np.isnan(out), axis=1)
ax0_mask = np.any(~np.isnan(out), axis=0)
out = out[ax1_mask, :]
out = out[:, ax0_mask]
return out
Then trying my_fun([6,7,11]) returns
[[ 6., 7.],
[11., nan]]
This 100% NumPy solution works for both contiguous and non-contiguous arrays of wanted numbers.
a = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
n = np.array([6, 7, 11])
Identify the locations of the wanted numbers:
mask = np.isin(a, n)
Select the rows and columns that have the wanted numbers:
np.where(mask, a, np.nan)\
[mask.any(axis=1)][:, mask.any(axis=0)]
#array([[ 6., 7.],
# [11., nan]])
One approach is to look for the bounding boxes by checking which elements in the array are contained in the second list. We can use scipy.ndimage:
from scipy import ndimage
m = np.isin(a, b)
a_components, _ = ndimage.measurements.label(m, np.ones((3, 3)))
bbox = ndimage.measurements.find_objects(a_components)
out = a[bbox[0]]
np.where(np.isin(out, b), out, np.nan)
array([[ 6., 7.],
[11., nan]])
Setup -
a = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
b = np.array([6,7,11])
Or for b = np.array([10,12,16]) we'd get:
m = np.isin(a, b)
a_components, _ = ndimage.measurements.label(m, np.ones((3, 3)))
bbox = ndimage.measurements.find_objects(a_components)
out = a[bbox[0]]
np.where(np.isin(out, b), out, np.nan)
array([[10., nan, 12.],
[nan, 16., nan]])
We could also adapt the above for multiple bounding boxes by doing:
b = np.array([5, 11, 8, 14])
m = np.isin(a, b)
a_components, _ = ndimage.measurements.label(m, np.ones((3, 3)))
bbox = ndimage.measurements.find_objects(a_components)
l = []
for box in bbox:
out = a[box]
l.append(np.where(np.isin(out, b), out, np.nan))
print(l)
[array([[ 5., nan],
[nan, 11.]]),
array([[ 8., nan],
[nan, 14.]])]
Taking advantage of the specific form of template array A we can directly transform the test values to coordinates:
A = np.arange(20).reshape(4,5)
test = [6,7,11]
y,x = np.unravel_index(test,A.shape)
yl,yr = y.min(),y.max()
xl,xr = x.min(),x.max()
out = np.full((yr-yl+1,xr-xl+1),np.nan)
out[y-yl,x-xl]=test
out
# array([[ 6., 7.],
# [11., nan]])

Replace values based on multiple conditions of two array?

Assume that I have two arrays
>>> import numpy as np
>>> a = np.random.randint(0, 10, size=(5, 4))
>>> a
array([[1, 6, 7, 4],
[2, 7, 4, 2],
[9, 3, 6, 4],
[9, 6, 8, 2],
[7, 2, 9, 5]])
>>> b = np.random.randint(0, 10, size=(5, 4))
>>> b
array([[ 5., 8., 6., 5.],
[ 1., 8., 4., 8.],
[ 1., 4., 6., 3.],
[ 4., 8., 6., 4.],
[ 8., 7., 7., 5.]], dtype=float32)
Now I have a situation where I need to compare elements of each arrays and replace with known values. For example my conditions are
if a == 0 then replace with 0 (or) if b == 0 then replace with 0
if a > 4 and < 11 then replace with 1 (or) if b > 1 and < 3 then replace with 1
if a > 10 and < 18 then replace with 2 (or) if b > 2 and < 5 then replace with 2
.
.
.
and finally
if a > 40 replace with 9 (or) if b > 9 then replace with 9.
Those replaced values can be stored in a new arrary which I need to use it for other function.
The simplest form of element wise comparison like a[ a > 2 ] = 1 works. But I am not aware of multiple comparison (multiple times) with same method.
I am sure that there is a easy way exist in numpy which I am unable to find. Any help is appreciated.
if
np.digitize should do what you want. The first arguments are the values you want to replace and the second are the thresholds.
a_replace = np.digitize(a, [0, 4, 10, ..., 40], right=True)
b_replace = np.digitize(b, [0, 1, 2, ..., 9], right=True)

SciPy sparse matrix (COO,CSR): Clear row

For creating a scipy sparse matrix, I have an array or row and column indices I and J along with a data array V. I use those to construct a matrix in COO format and then convert it to CSR,
matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
matrix = matrix.tocsr()
I have a set of row indices for which the only entry should be a 1.0 on the diagonal. So far, I go through I, find all indices that need wiping, and do just that:
def find(lst, a):
# From <http://stackoverflow.com/a/16685428/353337>
return [i for i, x in enumerate(lst) if x in a]
# wipe_rows = [1, 55, 32, ...] # something something
indices = find(I, wipe_rows) # takes too long
I = numpy.delete(I, indices).tolist()
J = numpy.delete(J, indices).tolist()
V = numpy.delete(V, indices).tolist()
# Add entry 1.0 to the diagonal for each wipe row
I.extend(wipe_rows)
J.extend(wipe_rows)
V.extend(numpy.ones(len(wipe_rows)))
# construct matrix via coo
That works alright, but find tends to take a while.
Any hints on how to speed this up? (Perhaps wiping the rows in COO or CSR format is a better idea.)
If you intend to clear multiple rows at once, this
def _wipe_rows_csr(matrix, rows):
assert isinstance(matrix, sparse.csr_matrix)
# delete rows
for i in rows:
matrix.data[matrix.indptr[i]:matrix.indptr[i+1]] = 0.0
# Set the diagonal
d = matrix.diagonal()
d[rows] = 1.0
matrix.setdiag(d)
return
is by far the fastest method. It doesn't really remove the lines, but sets all entries to zeros, then fiddles with the diagonal.
If the entries are actually to be removed, one has to do some array manipulation. This can be quite costly, but if speed is no issue: This
def _wipe_row_csr(A, i):
'''Wipes a row of a matrix in CSR format and puts 1.0 on the diagonal.
'''
assert isinstance(A, sparse.csr_matrix)
n = A.indptr[i+1] - A.indptr[i]
assert n > 0
A.data[A.indptr[i]+1:-n+1] = A.data[A.indptr[i+1]:]
A.data[A.indptr[i]] = 1.0
A.data = A.data[:-n+1]
A.indices[A.indptr[i]+1:-n+1] = A.indices[A.indptr[i+1]:]
A.indices[A.indptr[i]] = i
A.indices = A.indices[:-n+1]
A.indptr[i+1:] -= n-1
return
replaces a given row i of the matrix matrix by the entry 1.0 on the diagonal.
np.in1d should be a faster way of finding the indices:
In [322]: I # from a np.arange(12).reshape(4,3) matrix
Out[322]: array([0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int32)
In [323]: indices=[i for i, x in enumerate(I) if x in [1,2]]
In [324]: indices
Out[324]: [2, 3, 4, 5, 6, 7]
In [325]: ind1=np.in1d(I,[1,2])
In [326]: ind1
Out[326]:
array([False, False, True, True, True, True, True, True, False,
False, False], dtype=bool)
In [327]: np.where(ind1) # same as indices
Out[327]: (array([2, 3, 4, 5, 6, 7], dtype=int32),)
In [328]: I[~ind1] # same as the delete
Out[328]: array([0, 0, 3, 3, 3], dtype=int32)
Direct manipulation of the coo inputs like this often a good way. But another is to take advantage of the csr math abilities. You should be able to construct a diagonal matrix that zeros out the correct rows, and then adds the ones back in.
Here's what I have in mind:
In [357]: A=np.arange(16).reshape(4,4)
In [358]: M=sparse.coo_matrix(A)
In [359]: M.A
Out[359]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [360]: d1=sparse.diags([(1,0,0,1)],[0],(4,4))
In [361]: d2=sparse.diags([(0,1,1,0)],[0],(4,4))
In [362]: (d1*M+d2).A
Out[362]:
array([[ 0., 1., 2., 3.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 12., 13., 14., 15.]])
In [376]: x=np.ones((4,),bool);x[[1,2]]=False
In [378]: d1=sparse.diags([x],[0],(4,4),dtype=int)
In [379]: d2=sparse.diags([~x],[0],(4,4),dtype=int)
Doing this with lil format looks easy:
In [593]: Ml=M.tolil()
In [594]: Ml.data[wipe]=[[1]]*len(wipe)
In [595]: Ml.rows[wipe]=[[i] for i in wipe]
In [596]: Ml.A
Out[596]:
array([[ 0, 1, 2, 3],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0],
[12, 13, 14, 15]], dtype=int32)
It's sort of what you are doing with csr format, but it's easy to replace each row list with the appropriate [1] and [i] list. But conversion times (tolil etc) can hurt run times.

Using interpolate function over 2-D array

I have a 1-D function that takes so much time to compute over a big 2-D array of 'x' values, so it is much easy to create an interpolate function using SciPy facility and then compute y using it, which will be much faster. However, I cannot use the interpolation function on arrays that have more than 1-D.
Example:
# First, I create the interpolation function in the domain I want to work
x = np.arange(1, 100, 0.1)
f = exp(x) # a complicated function
f_int = sp.interpolate.InterpolatedUnivariateSpline(x, f, k=2)
# Now, in the code I do that
x = [[13, ..., 1], [99, ..., 45], [33, ..., 98] ..., [15, ..., 65]]
y = f_int(x)
# Which I want that it returns y = [[f_int(13), ..., f_int(1)], ..., [f_int(15), ..., f_int(65)]]
But returns:
ValueError: object too deep for desired array
I know I could loop over all x members, but I don't know if it is a better option...
Thanks!
EDIT:
A function like that also would do the job:
def vector_op(function, values):
orig_shape = values.shape
values = np.reshape(values, values.size)
return np.reshape(function(values), orig_shape)
I've tried the np.vectorize but it is too slow...
If f_int wants single dimensional data, you should flatten your input, feed it to the interpolator, then reconstruct your original shape:
>>> x = np.arange(1, 100, 0.1)
>>> f = 2 * x # a simple function to see the results are good
>>> f_int = scipy.interpolate.InterpolatedUnivariateSpline(x, f, k=2)
>>> x = np.arange(25).reshape(5, 5) + 1
>>> x
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
>>> x_int = f_int(x.reshape(-1)).reshape(x.shape)
>>> x_int
array([[ 2., 4., 6., 8., 10.],
[ 12., 14., 16., 18., 20.],
[ 22., 24., 26., 28., 30.],
[ 32., 34., 36., 38., 40.],
[ 42., 44., 46., 48., 50.]])
x.reshape(-1) does the flattening, and the .reshape(x.shape) returns it to its original form.
I think you want to do a vectorized function in numpy:
#create some random test data
test = numpy.random.random((100,100))
#a normal python function that you want to apply
def myFunc(i):
return np.exp(i)
#now vectorize the function so that it will work on numpy arrays
myVecFunc = np.vectorize(myFunc)
result = myVecFunc(test)
I would use a combination of a list comprehension and map (there might be a way to use two nested maps that I'm missing)
In [24]: x
Out[24]: [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
In [25]: [map(lambda a: a*0.1, x_val) for x_val in x]
Out[25]:
[[0.1, 0.2, 0.30000000000000004],
[0.1, 0.2, 0.30000000000000004],
[0.1, 0.2, 0.30000000000000004]]
this is just for illustration purposes.... replace lambda a: a*0.1 with your function, f_int

variable assignment: keep shape

...better to directly show the code. Here it is:
import numpy as np
a = np.zeros([3, 3])
a
array([[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
b = np.random.random_integers(0, 100, size = (1, 3))
b
array([[ 10, 3, 8]])
c = np.random.random_integers(0, 100, size = (4, 3))
c
array([[ 22, 21, 14],
[ 55, 64, 12],
[ 33, 85, 98],
[ 37, 44, 45]])
a = b will change dimensions of a
a = c will change dimensions of a
for a = b, I want:
array([[ 10., 3., 8.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
and for a = c, I want:
array([[ 22, 21, 14],
[ 55, 64, 12],
[ 33, 85, 98]])
So I want to lock the shape of 'a' so that values being assigned to it get "cropped" if necessary. Of course without if statements.
The problem is that the equal operator is making a shallow copy of the array, and what you want is a deep copy of part of the array.
So for this, if you know that b only has one outer array, then you can do:
a[0] = b
And if know that a is a 3x3, then you could also do:
a = c[0:3]
Furthermore, if you want them to be actual deep copies, you'll want:
a[0] = b.copy()
and
a = c[0:3].copy()
To make them independent.
If you don't already know the lengths of the matrices, you can use the len() function to find out at runtime.
You can do this easily by using Numpy slice notation. Here is a SO question with good answers explaining it clearly. Essentially, you need to ensure that the shape of the left hand array and the right had array match, and you can achieve this by slicing the corresponding arrays appropriately.
import numpy as np
a = np.zeros([3, 3])
b = np.array([[ 10, 3, 8]])
c = np.array([[ 22, 21, 14],
[ 55, 64, 12],
[ 33, 85, 98],
[ 37, 44, 45]])
a[0] = b
print a
a = c[0:3]
print a
Output:
[[ 10. 3. 8.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
[[22 21 14]
[55 64 12]
[33 85 98]]
It seems you want to replace elements in the top left of a 2D array with elements from a second 2D array without worrying about the sizes of the arrays. Here is a method:
def replacer(orig, repl):
new = np.copy(orig)
w2, h1 = new.shape
w1, h2 = repl.shape
new[0:min(w1,w2), 0:min(h1,h2)] = repl[0:min(w1,w2), 0:min(h1,h2)]
return new
print replacer(a,b)
print replacer(a,c)

Categories

Resources