Python Numpy: Operation on each sequential element without for loop? - python

This is an extremely simple question, but extensive search has not provided me with a satisfactory answer.
I have an array of numbers that evolve "over time" e.g. x = [1, 2, 3, 4, 5] and I want to calculate the mean at each timepoint. With a for-loop I would simply do
import numpy as np
x = [1, 2, 3, 4, 5]
y = np.empty(5)
for i in range(5):
y[i] = np.mean(x[0:i+1])
print(y)
[ 1. 1.5 2. 2.5 3. ]
In the processes I am working with, the numbers do not necessarily follow a simple dynamic like in the above. I wonder if there is some general way of applying a operation (such as calculating the mean) in a 'running' fashion, that is quicker than a for-loop?

How about
a = np.array([1, 2, 3, 4, 5])
np.cumsum(a)/(np.arange(1, a.size + 1))
?
That will work for calculating a running average.
I wonder if there is some general way of applying a operation (such as calculating the mean) in a 'running' fashion
I can't provide an answer to this. It depends on the operation.

Related

Nested array computations in Python using numpy

I am trying to use numpy in Python in solving my project.
I have a random binary array rndm = [1, 0, 1, 1] and a resource_arr = [[2, 3], 4, 2, [1, 2]]. What I am trying to do is to multiply the array element wise, then get their sum. As an expected output for the sample above,
output = 5 0 2 3. I find hard to solve such problem because of the nested array/list.
So far my code looks like this:
def fitness_score():
output = numpy.add(rndm * resource_arr)
return output
fitness_score()
I keep getting
ValueError: invalid number of arguments.
For which I think is because of the addition that I am trying to do. Any help would be appreciated. Thank you!
Numpy treats its arrays as matrices, and resource_arr is not a (valid) matrix. In your case a python list is more suitable:
def sum_nested(l):
tmp = []
for element in l:
if isinstance(element, list):
tmp.append(numpy.sum(element))
else:
tmp.append(element)
return tmp
In this function we check for each element inside l if it is a list. If so, we sum its elements. On the other hand, if the encountered element is just a number, we leave it untouched. Please note that this only works for one level of nesting.
Now, if we run sum_nested([[2, 3], 4, 2, [1, 2]]) we will get [5 4 2 3]. All that's left is multiplying this result by the elements of rndm, which can be achieved easily using numpy:
def fitness_score(a, b):
return numpy.multiply(a, sum_nested(b))
Numpy is all about the non-jagged arrays. You can do things with jagged arrays, but doing so efficiently and elegantly isnt trivial.
Almost always, trying to find a way to map your datastructure to a non-nested one, for instance, encoding the information as below, will be more flexible, and more performant.
resource_arr = (
[0, 0, 1, 2, 3, 3]
[2, 3, 4, 2, 1, 2]
)
That is, an integer denoting the 'row' each value belongs to, paired with an array of equal size of the values themselves.
This may 'feel' wasteful when coming from a C-style way of doing arrays (omg more memory consumption), but staying away from nested datastructures is almost certainly your best bet in terms of performance, and the amount of numpy/scipy ecosystem that will actually be compatible with your data representation. If it really uses more memory is actually rather questionable; every new python object uses a ton of bytes, so if you have only few elements per nesting, it is the more memory efficient solution too.
In this case, that would give you the following efficient solution to your problem:
output = np.bincount(*resource_arr) * rndm
I have not worked much with pandas/numpy so I'm not sure if this is most efficient way, but it works (atleast for the example you have shown):
import numpy as np
rndm = [1, 0, 1, 1]
resource_arr = [[2, 3], 4, 2, [1, 2]]
multiplied_output = np.multiply(rndm, resource_arr)
print(multiplied_output)
output = []
for elem in multiplied_output:
output.append(sum(elem)) if isinstance(elem, list) else output.append(elem)
final_output = np.array(output)
print(final_output)

Average arrays to same fixed length

I have multiple arrays with different length and I would like this data to be averaged to comparable arrays, e.g.
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([1, 2, 3, 4])
target_length = 3
def cast(array, target_length):
...
This should give cast(array1, target_length) as:
np.array([(1+2*0.66)/1.66, (2*0.33+3*1+4*0.33)/1.66, (4*0.66+5)/1.66 ])
because: 5/3=1.66. Also we would obtain:
cast(array1, target_length) as:
np.array([(1+2*0.33)/1.33, (2*0.66+3*0.66)/1.33, (3*0.33+4)/1.33])
because: 4/3=1.33.
The arrays will never need to grow as a good numpy solution is available for this.
Is there a solution using the numpy library?
The question could be be read in a few different ways, but if I got it right, what you are trying to achieve is
def cast(array, target_length):
target = np.zeros(target_length)
for i in range(target_length*len(array)):
target[i//len(array)] += array[i//target_length]/len(array)
return target
If that's what you are aiming for, this may be obtained through numpy operations as
def cast(array, target_length):
return np.mean(np.repeat(array, target_length).reshape(-1, len(array)), 1)

Removing nested loops in numpy

I've been writing a program to brute force check a sequence of numbers to look for euler bricks, but the method that I came up with involves a triple loop. Since nested Python loops get notoriously slow, I was wondering if there was a better way using numpy to create the array of values that I need.
#x=max side length of brick. User Input.
for t in range(3,x):
a=[];b=[];c=[];
for u in range(2,t):
for v in range(1,u):
a.append(t)
b.append(u)
c.append(v)
a=np.array(a)
b=np.array(b)
c=np.array(c)
...
Is there a better way to generate the array af values, using numpy commands?
Thanks.
Example:
If x=10, when t=3 I want to get:
a=[3]
b=[2]
c=[1]
the first time through the loop. After that, when t=4:
a=[4, 4, 4]
b=[2, 3, 3]
c=[1, 1, 2]
The third time (t=5) I want:
a=[5, 5, 5, 5, 5, 5]
b=[2, 3, 3, 4, 4, 4]
c=[1, 1, 2, 1, 2, 3]
and so on, up to max side lengths around 5000 or so.
EDIT: Solution
a=array(3)
b=array(2)
c=array(1)
for i in range(4,x): #Removing the (3,2,1) check from code does not affect results.
foo=arange(1,i-1)
foo2=empty(len(foo))
foo2.fill(i-1)
c=hstack((c,foo))
b=hstack((b,foo2))
a=empty(len(b))
a.fill(i)
...
Works many times faster now. Thanks all.
Try to use .empty and .fill (http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.fill.html)
There are couple of things which could help, but probably only for large values of x. For starters use xrange instead of range, that will save creating a list you never need. You could also create empty numpy arrays of the correct length and fill them up with the values as you go, instead of appending to a list and then converting it into a numpy array.
I believe this code will work (no python access right this second):
for t in xrange(3, x):
size = (t - 2) * (t - 3)
a = np.zeros(size)
b = np.zeros(size)
c = np.zeros(size)
idx = 0
for u in xrange(2,t):
for v in xrange(1,u):
a[idx] = t
b[idx] = u
c[idx] = v
idx += 1

compare two following values in numpy array

What is the best way to touch two following values in an numpy array?
example:
npdata = np.array([13,15,20,25])
for i in range( len(npdata) ):
print npdata[i] - npdata[i+1]
this looks really messed up and additionally needs exception code for the last iteration of the loop.
any ideas?
Thanks!
numpy provides a function diff for this basic use case
>>> import numpy
>>> x = numpy.array([1, 2, 4, 7, 0])
>>> numpy.diff(x)
array([ 1, 2, 3, -7])
Your snippet computes something closer to -numpy.diff(x).
How about range(len(npdata) - 1) ?
Here's code (using a simple array, but it doesn't matter):
>>> ar = [1, 2, 3, 4, 5]
>>> for i in range(len(ar) - 1):
... print ar[i] + ar[i + 1]
...
3
5
7
9
As you can see it successfully prints the sums of all consecutive pairs in the array, without any exceptions for the last iteration.
You can use ediff1d to get differences of consecutive elements. More generally, a[1:] - a[:-1] will give the differences of consecutive elements and can be used with other operators as well.

Accessing Lower Triangle of a Numpy Matrix?

Okay, so basically lets say i have a matrix:
matrix([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
Is it possible to get the area below the diagonal easily when working with numpy matrixs? I looked around and could not find anything. I can do the standard, for loop way however wouldnt that somehow invalidate the performance gave by numpy?
I am working on calculating statististics ofcomparing model output results with actual results. The data i currently have been given, results in around a 10,000 x 10,000 matrix. I am mainly trying to just sum those elements.
Is there an easy way to do this?
You could use tril and triu. See them here:
http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html
def tri_flat(array):
R = array.shape[0]
mask = np.asarray(np.invert(np.tri(R,R,dtype=bool)),dtype=float)
x,y = mask.nonzero()
return array[x,y]
I was looking for a convenience function myself, but this'll have to do... not sure it gets much easier. But if it does, I'd be interested to hear it. Every time you avoid a for-loop, an angel gets its wings.
-ejh
Quick NB: This avoids the diagonal... if you want it, and your matrix is symmetric, just omit the inversion (elem-wise NOT). Otherwise, you'll need a transpose in there.

Categories

Resources