python sum columns in matrix - python

Here there is a piece of my script. What this should do is opening a matrix (in the file matrix_seeds_to_all_targets) and sum all the elements in each column (at the end I should get a 1xN array). What I get instead is an error: AttributeError: 'list' object has no attribute 'sum'. Could you please give me any insight on this?
def collapse_probtrack_results(waytotal_file, matrix_file):
with open(waytotal_file) as f:
waytotal = int(f.read())
f = open(wayfile_template + roi + "/matrix_seeds_to_all_targets")
l = [map(int, line.split(',')) for line in f if line.strip() != ""]
collapsed = l.sum(axis=0) / waytotal * 100.
return collapsed
print (collapsed)

As the message says: lists don't have a method named sum. It isn't clear just what you are trying to do on that line, so can't be more helpful than that.

You could just use numpy instead of trying to sum over lists:
import numpy as np
matrix = np.random.randint(0, 100, (3, 6)) //read in your matrix file here
newMatrix = np.sum(matrix, axis=0)
print newMatrix
which will give you something like:
[168 51 23 115 208 54]
Without numpy, you would have to use something like a list comprehension to go over the "columns" in your lists to sum them. Python's list sum works on lists, which isn't what you have if you have 1) a matrix and 2) want to do the summing over columns

I think that the instruction l.sum() is wrong. The function used to sum over a list is sum and must be used as in this sample:
myList = [1, 2, 3]
sum(myList) # will return 6
myList.sum() # will throw an error
If you want to select a given column, you can that list comprehension: [row[columnID] for row in A]
So, for instance, that code wil sum over the different rows of a 2D array named l.
numCols = len(l[0])
result = []
for i in range(numCols)
result.append(sum([row[i] for row in l]))
print(result)
Also there seems that in your code there's a print after a return. I think it will never execute ;)

Related

Merge the first element of two different arrays into one array in Python

Say I have:
data_x = [0,1,2,3]
data_y = [4,5,6,7]
and I need the final result to be:
s_o_rms = [(0,4),(1,5),(2,6),(3,7)]
Till now I tried:
'''
i = 0
j = 0
s_o_rms = []
for i in data_x:
for j in data_y:
s_o_rms.append(data_x(i)+','+data_y(i))
i = i + 1
j = j + 1
print(s_o_rms)
'''
However I am getting an Error: 'numpy.ndarray' object is not callable.
Any idea of how I can solve this problem? or maybe another method I can use to obtain the needed result?
Note: data_x and data_y actually have 68 elements each, which is why I am using for loops but for the sake of explaining my problem I'm using a smaller array
s_o_rms = []
for i,j in zip(data_x,data_y):
s_o_rms.append((i,j))
print(s_o_rms)
You can try list comprehension
Result = [(data_x[index], data_y[index]) for index in range(Len(data_x))]
And someone mentioned it, zip() is another easy way

np.vectorize fails on a 2-d numpy array as input

I am trying to vectorize a function that takes a numpy array as input. I have a 2-d numpy array (shape is 1000,100) on which the function is to be applied on each of the 1000 rows. I tried to vectorize the function using np.vectorize. Here is the code:
def fun(i):
print(i)
location = geocoder.google([i[1], i[0]], method="reverse")
#print type(location)
location = str(location)
location = location.split("Reverse")
if len(location) > 1:
location1 = location[1]
return [i[0], i[1], location1]
#using np.vectorize
vec_fun = np.vectorize(fun)
Which raises the error
<ipython-input-19-1ee9482c6161> in fun(i)
1 def fun(i):
2 print(i)
----> 3 location = geocoder.google([i[1], i[0]], method="reverse")
4 #print type(location)
5 location = lstr(location)
IndexError: invalid index to scalar variable.
I have printed the argument that is passed in to the fun which prints a single value (the first element of the vector) rather than the vector(1 row) that is the reason of the index error but I'm not getting any idea how to resolve this.
By this time I think yo have solved your problem. However, I just found a way that solve this and may help other people with the same question. You can pass a signature="str" parameter to np.vectorize in order to specify the input and output shape. For example, the signature "(n) -> ()" expects an input shape with length (n) (rows) and outputs a scalar (). Therefore, it will broadcast up to rows:
def my_sum(row):
return np.sum(row)
row_sum = np.vectorize(my_sum, signature="(n) -> ()")
my_mat = np.array([
[1, 1, 1],
[2, 2, 2],
])
row_sum(my_mat)
OUT: array([3, 6])
vectorize runs your function on each element of an array, so it's not the right choice. Use a regular loop instead:
for row in some_array:
i0, i1, loc = fun(row)
It's up to you as to what you want to do with the output. Keep in mind that your function does not assign location1 if len(location) <= 1, and will raise an error in that case. It also returns a string rather than a numerical value in the third output.
Once you fix those issues, if you want to make an array of the output:
output = np.empty((some_array.shape[0], 3))
for i, row in enumerate(some_array):
output[i, :] = fun(row)

Return elements in a location corresponding to the minimum values of another array

I have two arrays with the same shape in the first two dimensions and I'm looking to record the minimum value in each row of the first array. However I would also like to record the elements in the corresponding position in the third dimension of the second array. I can do it like this:
A = np.random.random((5000, 100))
B = np.random.random((5000, 100, 3))
A_mins = np.ndarray((5000, 4))
for i, row in enumerate(A):
current_min = min(row)
A_mins[i, 0] = current_min
A_mins[i, 1:] = B[i, row == current_min]
I'm new to programming (so correct me if I'm wrong) but I understand that with Numpy doing calculations on whole arrays is faster than iterating over them. With this in mind is there a faster way of doing this? I can't see a way to get rid of the row == current_min bit even though the location of the minimum point must have been 'known' to the computer when it was calculating the min().
Any tips/suggestions appreciated! Thanks.
Something along what #lib talked about:
index = np.argmin(A, axis=1)
A_mins[:,0] = A[np.arange(len(A)), index]
A_mins[:,1:] = B[np.arange(len(A)), index]
It is much faster than using a for loop.
For getting the index of the minimum value, use amin instead of min + comparison
The amin function (and many other functions in numpy) also takes the argument axis, that you can use to get the minimum of each row or each column.
See http://docs.scipy.org/doc/numpy/reference/generated/numpy.amin.html

Dealing with multi-dimensional arrays when ndims not known in advance

I am working with data from netcdf files, with multi-dimensional variables, read into numpy arrays. I need to scan all values in all dimensions (axes in numpy) and alter some values. But, I don't know in advance the dimension of any given variable. At runtime I can, of course, get the ndims and shapes of the numpy array.
How can I program a loop thru all values without knowing the number of dimensions, or shapes in advance? If I knew a variable was exactly 2 dimensions, I would do
shp=myarray.shape
for i in range(shp[0]):
for j in range(shp[1]):
do_something(myarray[i][j])
You should look into ravel, nditer and ndindex.
# For the simple case
for value in np.nditer(a):
do_something_with(value)
# This is similar to above
for value in a.ravel():
do_something_with(value)
# Or if you need the index
for idx in np.ndindex(a.shape):
a[idx] = do_something_with(a[idx])
On an unrelated note, numpy arrays are indexed a[i, j] instead of a[i][j]. In python a[i, j] is equivalent to indexing with a tuple, ie a[(i, j)].
You can use the flat property of numpy arrays, which returns a generator on all values (no matter the shape).
For instance:
>>> A = np.array([[1,2,3],[4,5,6]])
>>> for x in A.flat:
... print x
1
2
3
4
5
6
You can also set the values in the same order they're returned, e.g. like this:
>>> A.flat[:] = [x / 2 if x % 2 == 0 else x for x in A.flat]
>>> A
array([[1, 1, 3],
[2, 5, 3]])
I am not sure the order in which flat returns the elements is guaranteed in any way (as it iterates through the elements as they are in memory, so depending on your array convention you are likely to have it always being the same, unless you are really doing it on purpose, but be careful...)
And this will work for any dimension.
** -- Edit -- **
To clarify what I meant by 'order not guaranteed', the order of elements returned by flat does not change, but I think it would be unwise to count on it for things like row1 = A.flat[:N], although it will work most of the time.
This might be the easiest with recursion:
a = numpy.array(range(30)).reshape(5, 3, 2)
def recursive_do_something(array):
if len(array.shape) == 1:
for obj in array:
do_something(obj)
else:
for subarray in array:
recursive_do_something(subarray)
recursive_do_something(a)
In case you want the indices:
a = numpy.array(range(30)).reshape(5, 3, 2)
def do_something(x, indices):
print(indices, x)
def recursive_do_something(array, indices=None):
indices = indices or []
if len(array.shape) == 1:
for obj in array:
do_something(obj, indices)
else:
for i, subarray in enumerate(array):
recursive_do_something(subarray, indices + [i])
recursive_do_something(a)
Look into Python's itertools module.
Python 2: http://docs.python.org/2/library/itertools.html#itertools.product
Python 3: http://docs.python.org/3.3/library/itertools.html#itertools.product
This will allow you to do something along the lines of
for lengths in product(shp[0], shp[1], ...):
do_something(myarray[lengths[0]][lengths[1]]

Iterating through a multidimensional array in Python

I have created a multidimensional array in Python like this:
self.cells = np.empty((r,c),dtype=np.object)
Now I want to iterate through all elements of my twodimensional array, and I do not care about the order. How do I achieve this?
It's clear you're using numpy. With numpy you can just do:
for cell in self.cells.flat:
do_somethin(cell)
If you need to change the values of the individual cells then ndenumerate (in numpy) is your friend. Even if you don't it probably still is!
for index,value in ndenumerate( self.cells ):
do_something( value )
self.cells[index] = new_value
Just iterate over one dimension, then the other.
for row in self.cells:
for cell in row:
do_something(cell)
Of course, with only two dimensions, you can compress this down to a single loop using a list comprehension or generator expression, but that's not very scalable or readable:
for cell in (cell for row in self.cells for cell in row):
do_something(cell)
If you need to scale this to multiple dimensions and really want a flat list, you can write a flatten function.
you can get the index of each element as well as the element itself using enumerate command:
for (i,row) in enumerate(cells):
for (j,value) in enumerate(row):
print i,j,value
i,j contain the row and column index of the element and value is the element itself.
How about this:
import itertools
for cell in itertools.chain(*self.cells):
cell.drawCell(surface, posx, posy)
No one has an answer that will work form arbitrarily many dimensions without numpy, so I'll put here a recursive solution that I've used
def iterThrough(lists):
if not hasattr(lists[0], '__iter__'):
for val in lists:
yield val
else:
for l in lists:
for val in iterThrough(l):
yield val
for val in iterThrough(
[[[111,112,113],[121,122,123],[131,132,133]],
[[211,212,213],[221,222,223],[231,232,233]],
[[311,312,313],[321,322,323],[331,332,333]]]):
print(val)
# 111
# 112
# 113
# 121
# ..
This doesn't have very good error checking but it works for me
It may be also worth to mention itertools.product().
cells = [[x*y for y in range(5)] for x in range(10)]
for x,y in itertools.product(range(10), range(5)):
print("(%d, %d) %d" % (x,y,cells[x][y]))
It can create cartesian product of an arbitrary number of iterables:
cells = [[[x*y*z for z in range(3)] for y in range(5)] for x in range(10)]
for x,y,z in itertools.product(range(10), range(5), range(3)):
print("(%d, %d, %d) %d" % (x,y,z,cells[x][y][z]))

Categories

Resources