np.vectorize fails on a 2-d numpy array as input - python

I am trying to vectorize a function that takes a numpy array as input. I have a 2-d numpy array (shape is 1000,100) on which the function is to be applied on each of the 1000 rows. I tried to vectorize the function using np.vectorize. Here is the code:
def fun(i):
print(i)
location = geocoder.google([i[1], i[0]], method="reverse")
#print type(location)
location = str(location)
location = location.split("Reverse")
if len(location) > 1:
location1 = location[1]
return [i[0], i[1], location1]
#using np.vectorize
vec_fun = np.vectorize(fun)
Which raises the error
<ipython-input-19-1ee9482c6161> in fun(i)
1 def fun(i):
2 print(i)
----> 3 location = geocoder.google([i[1], i[0]], method="reverse")
4 #print type(location)
5 location = lstr(location)
IndexError: invalid index to scalar variable.
I have printed the argument that is passed in to the fun which prints a single value (the first element of the vector) rather than the vector(1 row) that is the reason of the index error but I'm not getting any idea how to resolve this.

By this time I think yo have solved your problem. However, I just found a way that solve this and may help other people with the same question. You can pass a signature="str" parameter to np.vectorize in order to specify the input and output shape. For example, the signature "(n) -> ()" expects an input shape with length (n) (rows) and outputs a scalar (). Therefore, it will broadcast up to rows:
def my_sum(row):
return np.sum(row)
row_sum = np.vectorize(my_sum, signature="(n) -> ()")
my_mat = np.array([
[1, 1, 1],
[2, 2, 2],
])
row_sum(my_mat)
OUT: array([3, 6])

vectorize runs your function on each element of an array, so it's not the right choice. Use a regular loop instead:
for row in some_array:
i0, i1, loc = fun(row)
It's up to you as to what you want to do with the output. Keep in mind that your function does not assign location1 if len(location) <= 1, and will raise an error in that case. It also returns a string rather than a numerical value in the third output.
Once you fix those issues, if you want to make an array of the output:
output = np.empty((some_array.shape[0], 3))
for i, row in enumerate(some_array):
output[i, :] = fun(row)

Related

How to add a value to every element in a specified column of a 2D array?

I have a matrix, a:
a = np.random.randn(10,7)
a = a.astype(int)
How do I add 2 to every element in a specified column, let's say column 3?
I have tried a few approaches such as result = [x+2 for x in a] , but that doesn't work for 2D arrays because I don't know how to specify a column, and result = np.add(a(:,3(i+2))) which obviously gives me invalid syntax.
Thanks.
Try this:
a = np.random.randn(10, 7)
a = a.astype(int)
print(a)
a[:, 2] += 2
print(a)

iterate through number of columns, with variable columns

For example, let's consider this toy code
import numpy as np
import numpy.random as rnd
a = rnd.randint(0,10,(10,10))
k = (1,2)
b = a[:,k]
for col in np.arange(np.size(b,1)):
b[:,col] = b[:,col]+col*100
This code will work when the size of k is bigger than 1. However, with the size equal to 1, the extracted sub-matrix from a is transformed into a row vector, and applying the function in the for loop throws an error.
Of course, I could fix this by checking the dimension of b and reshaping:
if np.dim(b) == 1:
b = np.reshape(b, (np.size(b), 1))
in order to obtain a column vector, but this is expensive.
So, the question is: what is the best way to handle this situation?
This seems like something that would arise quite often and I wonder what is the best strategy to deal with it.
If you index with a list or tuple, the 2d shape is preserved:
In [638]: a=np.random.randint(0,10,(10,10))
In [639]: a[:,(1,2)].shape
Out[639]: (10, 2)
In [640]: a[:,(1,)].shape
Out[640]: (10, 1)
And I think b iteration can be simplified to:
a[:,k] += np.arange(len(k))*100
This sort of calculation will also be easier is k is always a list or tuple, and never a scalar (a scalar does not have a len).
np.column_stack ensures its inputs are 2d (and expands at the end if not) with:
if arr.ndim < 2:
arr = array(arr, copy=False, subok=True, ndmin=2).T
np.atleast_2d does
elif len(ary.shape) == 1:
result = ary[newaxis,:]
which of course could changed in this case to
if b.ndim==1:
b = b[:,None]
Any ways, I think it is better to ensure the k is a tuple rather than adjust b shape after. But keep both options in your toolbox.

python sum columns in matrix

Here there is a piece of my script. What this should do is opening a matrix (in the file matrix_seeds_to_all_targets) and sum all the elements in each column (at the end I should get a 1xN array). What I get instead is an error: AttributeError: 'list' object has no attribute 'sum'. Could you please give me any insight on this?
def collapse_probtrack_results(waytotal_file, matrix_file):
with open(waytotal_file) as f:
waytotal = int(f.read())
f = open(wayfile_template + roi + "/matrix_seeds_to_all_targets")
l = [map(int, line.split(',')) for line in f if line.strip() != ""]
collapsed = l.sum(axis=0) / waytotal * 100.
return collapsed
print (collapsed)
As the message says: lists don't have a method named sum. It isn't clear just what you are trying to do on that line, so can't be more helpful than that.
You could just use numpy instead of trying to sum over lists:
import numpy as np
matrix = np.random.randint(0, 100, (3, 6)) //read in your matrix file here
newMatrix = np.sum(matrix, axis=0)
print newMatrix
which will give you something like:
[168 51 23 115 208 54]
Without numpy, you would have to use something like a list comprehension to go over the "columns" in your lists to sum them. Python's list sum works on lists, which isn't what you have if you have 1) a matrix and 2) want to do the summing over columns
I think that the instruction l.sum() is wrong. The function used to sum over a list is sum and must be used as in this sample:
myList = [1, 2, 3]
sum(myList) # will return 6
myList.sum() # will throw an error
If you want to select a given column, you can that list comprehension: [row[columnID] for row in A]
So, for instance, that code wil sum over the different rows of a 2D array named l.
numCols = len(l[0])
result = []
for i in range(numCols)
result.append(sum([row[i] for row in l]))
print(result)
Also there seems that in your code there's a print after a return. I think it will never execute ;)

Evaluate absolute value of array element in function

I defined a function
def softthresh(u, LAMBDA):
if np.fabs(u) <= LAMBDA:
return 0
else:
return ((np.fabs(u) - LAMBDA) * u / np.fabs(u))
u is a numpy array, and np.fabs will check the relations for each array element (np.fabs(u_i)). It gives me the following error:
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Follow up Question:
Strange behaviour in simple function.
def softthresh(u,LAMBDA):
for i in u:
if np.fabs(i)<=LAMBDA:
return 0
else:
return ((np.fabs(i)-LAMBDA)*u/np.fabs(i))
ll = 5.0
xx = np.arange(-10,11)
yy = softthresh(xx,ll)
What I get is not what I expect. for u (=xx ) array-elements that are smaller than 5 i should get zero. But i don't. Why?
You are calling return from inside the inner loop. Therefore, your function returns just after it evaluates the first member of u.
Since you are using NumPy, you should take advantage of NumPy's ability to operate on the whole array at once, and also of NumPy's smart indexing.
def softthreshold(u, LAMBDA):
notzero = np.fabs(u) > LAMBDA # find the indeces of elements that need to be scaled
rr = np.zeros_like(u) # an array the same size/type as u, already initialized to 0
rr[notzero] = (np.fabs(u[notzero])-LAMBDA)*u[notzero]/np.fabs(u[notzero]) # scale each of the members that aren't zero
return rr
Your problems depends on the numpy array. If you are working with a list it works.
Otherwise if you need the numpy array you can use code like
def softthresh(u,LAMBDA):
for i in u:
if np.fabs(i)<=LAMBDA:
return 0
else:
return ((np.fabs(u)-LAMBDA)*u/np.fabs(u))
You get the array through the dependency of <= logic and the numpy.array definition.
If u is an array, you need to loop through all its elements in your function.
Alternatively, you can have u be an element of your array and call it with a loop like this :
tbl = np.array([1, 2, 3, 4, 5])
for elt in tbl:
print(softthresh(elt, 3))
Results would be :
0
0
0
1.0
2.0

Dealing with multi-dimensional arrays when ndims not known in advance

I am working with data from netcdf files, with multi-dimensional variables, read into numpy arrays. I need to scan all values in all dimensions (axes in numpy) and alter some values. But, I don't know in advance the dimension of any given variable. At runtime I can, of course, get the ndims and shapes of the numpy array.
How can I program a loop thru all values without knowing the number of dimensions, or shapes in advance? If I knew a variable was exactly 2 dimensions, I would do
shp=myarray.shape
for i in range(shp[0]):
for j in range(shp[1]):
do_something(myarray[i][j])
You should look into ravel, nditer and ndindex.
# For the simple case
for value in np.nditer(a):
do_something_with(value)
# This is similar to above
for value in a.ravel():
do_something_with(value)
# Or if you need the index
for idx in np.ndindex(a.shape):
a[idx] = do_something_with(a[idx])
On an unrelated note, numpy arrays are indexed a[i, j] instead of a[i][j]. In python a[i, j] is equivalent to indexing with a tuple, ie a[(i, j)].
You can use the flat property of numpy arrays, which returns a generator on all values (no matter the shape).
For instance:
>>> A = np.array([[1,2,3],[4,5,6]])
>>> for x in A.flat:
... print x
1
2
3
4
5
6
You can also set the values in the same order they're returned, e.g. like this:
>>> A.flat[:] = [x / 2 if x % 2 == 0 else x for x in A.flat]
>>> A
array([[1, 1, 3],
[2, 5, 3]])
I am not sure the order in which flat returns the elements is guaranteed in any way (as it iterates through the elements as they are in memory, so depending on your array convention you are likely to have it always being the same, unless you are really doing it on purpose, but be careful...)
And this will work for any dimension.
** -- Edit -- **
To clarify what I meant by 'order not guaranteed', the order of elements returned by flat does not change, but I think it would be unwise to count on it for things like row1 = A.flat[:N], although it will work most of the time.
This might be the easiest with recursion:
a = numpy.array(range(30)).reshape(5, 3, 2)
def recursive_do_something(array):
if len(array.shape) == 1:
for obj in array:
do_something(obj)
else:
for subarray in array:
recursive_do_something(subarray)
recursive_do_something(a)
In case you want the indices:
a = numpy.array(range(30)).reshape(5, 3, 2)
def do_something(x, indices):
print(indices, x)
def recursive_do_something(array, indices=None):
indices = indices or []
if len(array.shape) == 1:
for obj in array:
do_something(obj, indices)
else:
for i, subarray in enumerate(array):
recursive_do_something(subarray, indices + [i])
recursive_do_something(a)
Look into Python's itertools module.
Python 2: http://docs.python.org/2/library/itertools.html#itertools.product
Python 3: http://docs.python.org/3.3/library/itertools.html#itertools.product
This will allow you to do something along the lines of
for lengths in product(shp[0], shp[1], ...):
do_something(myarray[lengths[0]][lengths[1]]

Categories

Resources