Accessing Lower Triangle of a Numpy Matrix? - python

Okay, so basically lets say i have a matrix:
matrix([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
Is it possible to get the area below the diagonal easily when working with numpy matrixs? I looked around and could not find anything. I can do the standard, for loop way however wouldnt that somehow invalidate the performance gave by numpy?
I am working on calculating statististics ofcomparing model output results with actual results. The data i currently have been given, results in around a 10,000 x 10,000 matrix. I am mainly trying to just sum those elements.
Is there an easy way to do this?

You could use tril and triu. See them here:
http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html

def tri_flat(array):
R = array.shape[0]
mask = np.asarray(np.invert(np.tri(R,R,dtype=bool)),dtype=float)
x,y = mask.nonzero()
return array[x,y]
I was looking for a convenience function myself, but this'll have to do... not sure it gets much easier. But if it does, I'd be interested to hear it. Every time you avoid a for-loop, an angel gets its wings.
-ejh
Quick NB: This avoids the diagonal... if you want it, and your matrix is symmetric, just omit the inversion (elem-wise NOT). Otherwise, you'll need a transpose in there.

Related

Nested array computations in Python using numpy

I am trying to use numpy in Python in solving my project.
I have a random binary array rndm = [1, 0, 1, 1] and a resource_arr = [[2, 3], 4, 2, [1, 2]]. What I am trying to do is to multiply the array element wise, then get their sum. As an expected output for the sample above,
output = 5 0 2 3. I find hard to solve such problem because of the nested array/list.
So far my code looks like this:
def fitness_score():
output = numpy.add(rndm * resource_arr)
return output
fitness_score()
I keep getting
ValueError: invalid number of arguments.
For which I think is because of the addition that I am trying to do. Any help would be appreciated. Thank you!
Numpy treats its arrays as matrices, and resource_arr is not a (valid) matrix. In your case a python list is more suitable:
def sum_nested(l):
tmp = []
for element in l:
if isinstance(element, list):
tmp.append(numpy.sum(element))
else:
tmp.append(element)
return tmp
In this function we check for each element inside l if it is a list. If so, we sum its elements. On the other hand, if the encountered element is just a number, we leave it untouched. Please note that this only works for one level of nesting.
Now, if we run sum_nested([[2, 3], 4, 2, [1, 2]]) we will get [5 4 2 3]. All that's left is multiplying this result by the elements of rndm, which can be achieved easily using numpy:
def fitness_score(a, b):
return numpy.multiply(a, sum_nested(b))
Numpy is all about the non-jagged arrays. You can do things with jagged arrays, but doing so efficiently and elegantly isnt trivial.
Almost always, trying to find a way to map your datastructure to a non-nested one, for instance, encoding the information as below, will be more flexible, and more performant.
resource_arr = (
[0, 0, 1, 2, 3, 3]
[2, 3, 4, 2, 1, 2]
)
That is, an integer denoting the 'row' each value belongs to, paired with an array of equal size of the values themselves.
This may 'feel' wasteful when coming from a C-style way of doing arrays (omg more memory consumption), but staying away from nested datastructures is almost certainly your best bet in terms of performance, and the amount of numpy/scipy ecosystem that will actually be compatible with your data representation. If it really uses more memory is actually rather questionable; every new python object uses a ton of bytes, so if you have only few elements per nesting, it is the more memory efficient solution too.
In this case, that would give you the following efficient solution to your problem:
output = np.bincount(*resource_arr) * rndm
I have not worked much with pandas/numpy so I'm not sure if this is most efficient way, but it works (atleast for the example you have shown):
import numpy as np
rndm = [1, 0, 1, 1]
resource_arr = [[2, 3], 4, 2, [1, 2]]
multiplied_output = np.multiply(rndm, resource_arr)
print(multiplied_output)
output = []
for elem in multiplied_output:
output.append(sum(elem)) if isinstance(elem, list) else output.append(elem)
final_output = np.array(output)
print(final_output)

How to fix method in python that returns a 2d array with empty array elements?

The following code implements a backtracking algorithm to find all the possible permutations of a given array of numbers and the record variable stores the permutation when the code reaches base case. The code seems to run accordingly, that is, the record variable gets filled up with valid permutations, but for some reason when the method finishes the method returns a two-dimensional array whose elements are empty.
I tried declaring record as a tuple or a dictionary and tried using global and nonlocal variables, but it none of it worked.
def permute(arr):
record = []
def createPermutations(currentArr, optionArr):
if len(optionArr) == 0:
if len(currentArr) != 0: record.append(currentArr)
else: pass
print(record)
else:
for num in range(len(optionArr)):
currentArr.append(optionArr[num])
option = optionArr[0:num] + optionArr[num+1::]
createPermutations(currentArr, option)
currentArr.pop()
createPermutations([], arr)
return record
print(permute([1,2,3]))
The expect result should be [[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]], but instead I got [[], [], [], [], [], []].
With recursive functions, you should pass a copy of the current array, rather than having all of those currentArr.pop() mutating the same array.
Replace
createPermutations(currentArr, option)
by
createPermutations(currentArr[:], option)
Finally, as a learning exercise for recursion, something like this is fine, but if you need permutations for a practical programming problem, use itertools:
print([list(p) for p in itertools.permutations([1,2,3])])
I would accept John Coleman's answer as it is the correct way to solve your issue and resolves other bugs that you run into as a result.
The reason you run into this issue because python is pass-by-object-reference, in which copies of lists are not passed in but the actual list itself. What this leads to is another issue in your code; in which you would get [[3, 2, 1], [3, 2, 1], [3, 2, 1], [3, 2, 1], [3, 2, 1], [3, 2, 1]] as your output when you print(record).
Why this happens is that when you call record.append(currentArr), it actually points to the same object reference as all the other times you call record.append(currentArr). Thus you will end up with 6 copies of the same array (in this case currentArr) at the end because all your appends point to the same array. A 2d list is just a list of pointers to other lists.
Now that you understand this, it is easier to understand why you get [[],[],[],[],[],[]] as your final output. Because you add to and then pop from currentArr over here currentArr.append(optionArr[num])
and over here
currentArr.pop() to return it back to normal,
your final version of currentArr will be what you passed in, i.e. [].
Since result is a 2d array of 6 currentArrs, you will get [[],[],[],[],[],[]] as your returned value.
This may help you better how it all works, since it has diagrams as well: https://robertheaton.com/2014/02/09/pythons-pass-by-object-reference-as-explained-by-philip-k-dick/

Dynamic matrix in Python

I'm new to Python and I need a dynamic matrix that I can manipulate adding more columns and rows to it. I read about numpy.matrix, but I can't find a method in there that does what I mentioned above. It occurred to me to use lists but I want to know if there is a simpler way to do it or a better implementation.
Example of what I look for:
matrix.addrow ()
matrix.addcolumn ()
matrix.changeValue (0, 0, "$200")
Am I asking for too much? If so, any ideas of how to implement something like that? Thanks!
You can do all of that in numpy (np.concatenate for example) or native python (my_list.append()). Which one is more efficient will depend on what else your program will do: numpy will be probably less efficient if all you are doing is adding / changing values one at a time, or do a lot of column 'adding' or 'removing'. However if you do matrix or column operations, the overhead of adding new columns to a numpy array maybe offset by the vectorized computation speed offered by numpy. So pick which ever you prefer, and if speed is an issue, then you need to experiment yourself with both approaches...
There are several ways to represent matrices in Python. You can use List of lists or numpy arrays. For example if you were to use numpy arrays
>>> import numpy as np
>>> a = np.array([[1,2,3], [2,3,4]])
>>> a
array([[1, 2, 3],
[2, 3, 4]])
To add a row
>>> np.vstack([a, [7,8,9]])
array([[1, 2, 3],
[2, 3, 4],
[7, 8, 9]])
To add a column
>>> np.hstack((a, [[7],[8]]))
array([[1, 2, 3, 7],
[2, 3, 4, 8]])

Python Numpy: Operation on each sequential element without for loop?

This is an extremely simple question, but extensive search has not provided me with a satisfactory answer.
I have an array of numbers that evolve "over time" e.g. x = [1, 2, 3, 4, 5] and I want to calculate the mean at each timepoint. With a for-loop I would simply do
import numpy as np
x = [1, 2, 3, 4, 5]
y = np.empty(5)
for i in range(5):
y[i] = np.mean(x[0:i+1])
print(y)
[ 1. 1.5 2. 2.5 3. ]
In the processes I am working with, the numbers do not necessarily follow a simple dynamic like in the above. I wonder if there is some general way of applying a operation (such as calculating the mean) in a 'running' fashion, that is quicker than a for-loop?
How about
a = np.array([1, 2, 3, 4, 5])
np.cumsum(a)/(np.arange(1, a.size + 1))
?
That will work for calculating a running average.
I wonder if there is some general way of applying a operation (such as calculating the mean) in a 'running' fashion
I can't provide an answer to this. It depends on the operation.

Theory and step-by-step instruction of scipy.ndimage.convolve

Good day to all.
Help me please to understand theory of function scipy.ndimage.convolve for 1D arrays. I know the formula from http://lagrange.univ-lyon1.fr/docs/scipy/0.17.1/generated/scipy.ndimage.convolve.html
C_i = \sum_j{I_{i+j-k} W_j},
but i can't understand, how can I get results manually.
For example: test_1 = scipy.ndimage.convolve([1, 2, 3], [1, 2, 3, 4, 5])
result is [24 24 30]
Or test_2 = scipy.ndimage.convolve([1, 2, 3], [3, 4, 5])
result is [15 22 31]
If I write here all attempts that I have made, it will take a lot of space.
Give me please step by step instructions on what to do with these examples manually.
Two tricky things going on here
1) the ndimage has this flag called "mode" which is set to "reflect" by default
2) two is that convolutions internally reverse one of the inputs
try comparing this piece of code
scipy.ndimage.convolve([1, 2, 3][::-1], [1, 2, 3, 4, 5],mode='constant')
to your by hand solution. (get rid of the "[::-1]" if you've already accounted for the reversal)

Categories

Resources