I have an array that I've labeled using scipy.ndimage and I'd like to multiply each element by a factor specific to its corresponding label. I thought I could use ndimage.labeled_comprehension for this, however I can't seem to figure out how to pass an argument to the function. For example:
a = np.random.random(9).reshape(3,3)
lbls = np.repeat(np.arange(3),3).reshape(3,3)
ndx = np.arange(0,lbls.max()+1)
factors = np.random.randint(10,size=3)
>>> lbls
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
>>> ndx
array([0, 1, 2])
>>> factors
array([5, 4, 8])
def fn(a, x):
return a*x
>>> b = ndimage.labeled_comprehension(a, labels=lbls, index=ndx, func=fn, out_dtype=float, default=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/tgrant/anaconda/envs/python2/lib/python2.7/site-packages/scipy/ndimage/measurements.py", line 416, in labeled_comprehension
do_map([input], temp)
File "/Users/tgrant/anaconda/envs/python2/lib/python2.7/site-packages/scipy/ndimage/measurements.py", line 411, in do_map
output[i] = func(*[inp[l:h] for inp in inputs])
TypeError: fn() takes exactly 2 arguments (1 given)
As expected it gives an error since fn() needs factors fed into it somehow. Is labeled_comprehension able to do this?
Index into factors and then simply multiply with the image array -
a*factors[lbls]
Sample run -
In [483]: a # image/data array
Out[483]:
array([[ 0.10682998, 0.29631501, 0.08501469],
[ 0.46944505, 0.88346229, 0.75672908],
[ 0.11381292, 0.24096868, 0.86438641]])
In [484]: factors # scaling factors
Out[484]: array([8, 1, 1])
In [485]: lbls # labels
Out[485]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
In [486]: factors[lbls] # factors populated based on the labels
Out[486]:
array([[8, 8, 8],
[1, 1, 1],
[1, 1, 1]])
In [487]: a*factors[lbls] # finally scale the image array
Out[487]:
array([[ 0.85463981, 2.37052006, 0.68011752],
[ 0.46944505, 0.88346229, 0.75672908],
[ 0.11381292, 0.24096868, 0.86438641]])
Related
I have a list with values [5, 5, 5, 5, 5] and I have a matrix too filled with with 1 and 0.
I want to have a new list that have to be like this:
if there's a 1 into the matrix then sum a '2' into the v's value if it's the first row and sum a '3' it's the second row.
example:
list:
v = [5,5,5,5,5]
matrix:
m = [[0, 1, 1, 0, 0], [0, 0, 1, 1, 0]]
final result:
v1 = [5,7,10,8,5]
Create a function that adds array lines, you can have the parameters be 1D numeric arrays. Loops through the arrays and returns a result array that is the addition of each element.
If your task requires it, add a check if the lines are of equal length and abort the function with an error if so.
Run this function on all of the matrix lines and then run it for the result of that and the input array.
Hope I managed to be comprehensive enough
You can use NumPy package for efficient code.
import numpy as np
v = [5,5,5,5,5]
matrix = [[0, 1, 1, 0, 0],
[0, 0, 1, 1, 0]]
weights = np.array([2,3])
w_matrix = np.multiply(matrix, weights[:, np.newaxis]).sum(axis=0)
v1 = v + w_matrix
classical python:
You can use a loop comprehension:
to_add = [sum((A*B) for A,B in zip(factors,x)) for x in zip(*m)]
[a+b for a,b in zip(v, to_add)]
output: [5, 7, 10, 8, 5]
numpy:
That said, this is a perfect use case for numpy that is more efficient and less verbose:
import numpy as np
v = [5,5,5,5,5]
m = [[0, 1, 1, 0, 0], [0, 0, 1, 1, 0]]
factors = [2,3]
V = np.array(v)
M = np.array(m)
F = np.array(factors)
V+(M*F[:,None]).sum(0)
output: array([ 5, 7, 10, 8, 5])
I try to use dask.array.apply_along_axis to a 2D array. However, my array is a dask array, it always throws an exception which like following:
Traceback (most recent call last):
File "D:/test/apply_along_axis_test.py", line 22, in <module>
b = da.apply_along_axis(lambda a: a[index_array], 1, source_array)
File "D:\Program Files\Python3\lib\site-packages\dask\array\routines.py", line 383, in apply_along_axis
test_result = np.array(func1d(test_data, *args, **kwargs))
File "D:/test/apply_along_axis_test.py", line 22, in <lambda>
b = da.apply_along_axis(lambda a: a[index_array], 1, source_array)
IndexError: index 1 is out of bounds for axis 0 with size 1
However, when I apply this method to a numpy.array. It can run successfully.
An example code is like this:
source_array = np.random.randint(0, 10, (2, 4))
index_array = np.asarray([[0, 0], [1, 0], [2, 1], [3, 2]])
b = np.apply_along_axis(lambda a: a[index_array], 1, source_array)
print(b)
source_array = da.from_array(source_array)
b = da.apply_along_axis(lambda a: a[index_array], 1, source_array)
I can successfully print b. However, the last row of the code will throw the exception. I think maybe I should use some map method like map_partitions. However, I can not find any method like this in dask.array.
I think this should be resolved by defining shape and dtype. You can either do this manually or use make_meta to infer what it should be:
In [55]: from dask.dataframe.utils import make_meta
In [56]: da.apply_along_axis(lambda a: a[index_array], 1, source_array,
...: shape=make_meta(source_array).shape,
...: dtype=make_meta(source_array).dtype).compute()
Out[56]:
array([[[2, 2],
[1, 2],
[1, 1],
[6, 1]],
[[1, 1],
[6, 1],
[9, 6],
[3, 9]]])
You are also not the first person to run into this issue: https://github.com/dask/dask/issues/3727
I am trying to build a meshgrid in order to compute an interpolation. I inspire myself from this example. My function is the following:
def oscillatory_(a,b,d,w=w,c=c):
x = np.array([a,b,d])
return np.cos(2*math.pi*w[0,0] + np.sum(c.T*x, axis = 1))
Which I call through:
data = oscillatory_(*np.meshgrid(a,b,d,indexing='ij', sparse=True))
Where
a = grid_dim[:,0]
b = grid_dim[:,1]
d = grid_dim[:,2]
are just some values taken from grid_dim which is a numpy n-array
When trying to run the code, I get the following error:
Traceback (most recent call last):
File "/usr/lib/python3.6/code.py", line 91, in runcode
exec(code, self.locals)
File "<input>", line 9, in <module>
File "<input>", line 3, in f
AttributeError: 'numpy.ndarray' object has no attribute 'cos'
I really don't understand why is he assigning cos as an attribute and not a function, because if I run the function oscillatory outside the *np.meshgrid() everything is ok.
Also, I have played with the toy example from the link below adding a np.cos function and everything works fine. The problem is in my function and I am not able to figure where.
I am doing this in order to compute an interpolation afterwards through:
my_interpolating_function = RegularGridInterpolator((a,b,d), data)
Any help would be highly appreciated on that one.
Thank you very much
Making an array from sparse meshgrid produces an object dtype array
In [448]: I,J=np.meshgrid([0,1,2],[0,1], indexing='ij', sparse=True)
In [449]: I
Out[449]:
array([[0],
[1],
[2]])
In [450]: J
Out[450]: array([[0, 1]])
In [451]: np.array([I, J])
Out[451]:
array([array([[0],
[1],
[2]]), array([[0, 1]])], dtype=object)
With sparse False, you get a valid numeric array:
In [452]: I,J=np.meshgrid([0,1,2],[0,1], indexing='ij', sparse=False)
In [453]: I
Out[453]:
array([[0, 0],
[1, 1],
[2, 2]])
In [454]: J
Out[454]:
array([[0, 1],
[0, 1],
[0, 1]])
In [455]: np.array([I, J])
Out[455]:
array([[[0, 0],
[1, 1],
[2, 2]],
[[0, 1],
[0, 1],
[0, 1]]])
I am seeing behaviour with numpy bincount that I cannot make sense of. I want to bin the values in a 2D array in a row-wise manner and see the behaviour below. Why would it work with dbArray but fail with simarray?
>>> dbArray
array([[1, 0, 1, 0, 1],
[1, 1, 1, 1, 1],
[1, 1, 0, 1, 1],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 1, 0]])
>>> N.apply_along_axis(N.bincount,1,dbArray)
array([[2, 3],
[0, 5],
[1, 4],
[4, 1],
[3, 2],
[3, 2]], dtype=int64)
>>> simarray
array([[2, 0, 2, 0, 2],
[2, 1, 2, 1, 2],
[2, 1, 1, 1, 2],
[2, 0, 1, 0, 1],
[1, 0, 1, 1, 2],
[1, 1, 1, 1, 1]])
>>> N.apply_along_axis(N.bincount,1,simarray)
Traceback (most recent call last):
File "<pyshell#31>", line 1, in <module>
N.apply_along_axis(N.bincount,1,simarray)
File "C:\Python27\lib\site-packages\numpy\lib\shape_base.py", line 118, in apply_along_axis
outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (3)
The problem is that bincount isn't always returning the same shaped objects, in particular when values are missing. For example:
>>> m = np.array([[0,0,1],[1,1,0],[1,1,1]])
>>> np.apply_along_axis(np.bincount, 1, m)
array([[2, 1],
[1, 2],
[0, 3]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([2, 1]), array([1, 2]), array([0, 3])]
works, but:
>>> m = np.array([[0,0,0],[1,1,0],[1,1,0]])
>>> m
array([[0, 0, 0],
[1, 1, 0],
[1, 1, 0]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([3]), array([1, 2]), array([1, 2])]
>>> np.apply_along_axis(np.bincount, 1, m)
Traceback (most recent call last):
File "<ipython-input-49-72e06e26a718>", line 1, in <module>
np.apply_along_axis(np.bincount, 1, m)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.py", line 117, in apply_along_axis
outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (1)
won't.
You could use the minlength parameter and pass it using a lambda or partial or something:
>>> np.apply_along_axis(lambda x: np.bincount(x, minlength=2), axis=1, arr=m)
array([[3, 0],
[1, 2],
[1, 2]])
As #DSM has already mentioned, bincount of a 2d array cannot be done without knowing the maximum value of the array, because it would mean an inconsistency of array sizes.
But thanks to the power of numpy's indexing, it was fairly easy to make a faster implementation of 2d bincount, as it doesn't use concatenation or anything.
def bincount2d(arr, bins=None):
if bins is None:
bins = np.max(arr) + 1
count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
indexing = np.arange(len(arr))
for col in arr.T:
count[indexing, col] += 1
return count
t = np.array([[1,2,3],[4,5,6],[3,2,2]], dtype=np.int64)
print(bincount2d(t))
P.S.
This:
t = np.empty(shape=[10000, 100], dtype=np.int64)
s = time.time()
bincount2d(t)
e = time.time()
print(e - s)
gives ~2 times faster result, than this:
t = np.empty(shape=[100, 10000], dtype=np.int64)
s = time.time()
bincount2d(t)
e = time.time()
print(e - s)
because of the for loop iterating over columns. So, it's better to transpose your 2d array, if shape[0] < shape[1].
UPD
Better than this can't be done (using python alone, I mean):
def bincount2d(arr, bins=None):
if bins is None:
bins = np.max(arr) + 1
count = np.zeros(shape=[len(arr), bins], dtype=np.int64)
indexing = (np.ones_like(arr).T * np.arange(len(arr))).T
np.add.at(count, (indexing, arr), 1)
return count
This is a function that does exactly what you want, but without any loops.
def sub_sum_partition(a, partition):
"""
Generalization of np.bincount(partition, a).
Sums rows of a matrix for each value of array of non-negative ints.
:param a: array_like
:param partition: array_like, 1 dimension, nonnegative ints
:return: matrix of shape ('one larger than the largest value in partition', a.shape[1:]). The i's element is
the sum of rows j in 'a' s.t. partition[j] == i
"""
assert partition.shape == (len(a),)
n = np.prod(a.shape[1:], dtype=int)
bins = ((np.tile(partition, (n, 1)) * n).T + np.arange(n, dtype=int)).reshape(-1)
sums = np.bincount(bins, a.reshape(-1))
if n > 1:
sums = sums.reshape(-1, *a.shape[1:])
return sums
I have a problem with lists/arrays/matrix at Python.
I have a list of matrix (or arrays if it need to be) and i want to add to every single of of them a new column of ones (of the same number of lines). How can I do this??
I've a couple of things and didn't get any success.
Thanks for the help.
Here's an example:
>>> A=[mat([[1,2,3],[4,5,6],[7,8,9]]),mat([[1,0,0],[0,1,0],[0,0,1]])]
>>> A
[matrix([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]), matrix([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])]
Using the answer you guys told
>>> A = np.hstack((A, np.ones((A.shape[0],1),dtype=A.type)))
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
A = np.hstack((A, np.ones((A.shape[0],1),dtype=A.type)))
AttributeError: 'list' object has no attribute 'shape'`
Example for a 2D NumPy ndarray:
>>> m = np.arange(12).reshape(3,4)
>>> m = np.hstack((m, np.ones((m.shape[0], 1), dtype=m.dtype)))
>>> m
array([[ 0, 1, 2, 3, 1],
[ 4, 5, 6, 7, 1],
[ 8, 9, 10, 11, 1]])
Edit: It works the same for a matrix. For a list of matrices, you can use a for loop:
>>> matrices = [np.matrix(np.random.randn(3,4)) for i in range(10)]
>>> for i, m in enumerate(matrices):
... matrices[i] = np.hstack((m, np.ones((m.shape[0], 1), dtype=m.dtype)))
2d column arrays:
for matrix in matricies:
matrix.append([1,] * len(matrix[0]))
2d row arrays:
for matrix in matricies:
for row in matrix:
row.append(1)