numpy applying aggregated information across a 2D array - python

i have a list of agent's "health" in a 2D array:
health=[[0.5,0.8],[0.1,0.5],[0.5,0.7]]
and a set of actions( one per agent):
actions=[[0,1],[2,0],[1,1]]
(possible actions = {0,1,2}
depending on their actions, their health gets increased, according to payoff matrix:
payoff[agent_row][action]=
[[0,0.2,0.5],
[0,1,0.6],
[0,0.2,0.4]]
calculating their new health is done by:
health=[[0.5,0.8],[0.1,0.5],[0.5,0.7]]
actions=[[0,1],[2,0],[1,1]]
payoff= \
[[0,0.2,0.5], \
[0,1,0.6], \
[0,0.2,0.4]]
for r,h in enumerate(health):
for i,_ in enumerate(h):
health[r][i]+=payoff[r][actions[r][i]]
print health
How can I use numpy to speed things up?
Bonus points given for a solution which works in a reasonably stable version of pypy
Comment: EDIT: code should be working now

Using numpy you can use integer arrays directly to index other arrays. For example A[[0, 1, 2], [11, 12, 13]] will return an array with 3 elements which are A[0, 11], A[1, 12] and A[2, 13]. Here is how you can apply this concept to your problem.
import numpy as np
health = np.array([[0.5, 0.8], [0.1, 0.5], [0.5, 0.7]])
actions = np.array([[0, 1], [2, 0], [1, 1]])
payoff = np.array([[0, 0.2, 0.5],
[0, 1, 0.6],
[0, 0.2, 0.4]])
r, c = np.indices(health.shape)
health += payoff[r, actions]
This is called advanced indexing, here is a good reference for it.

Related

Masking Using Pixel Statistics

I'm trying to mask bad pixels in a dataset taken from a detector. In my attempt to come up with a general way to do this so I can run the same code across different images, I tried a few different methods, but none of them ended up working. I'm pretty new with coding and data analysis in Python, so I could use a hand putting things in terms that the computer will understand.
As an example, consider the matrix
A = np.array([[3,5,50],[30,2,6],[25,1,1]])
What I'm wanting to do is set any element in A that is two standard deviations away from the mean equal to zero. The reason for this is that later in the code, I'm defining a function that only uses the nonzero values for the calculation, since the zeros are part of the mask.
I know this masking technique works, but I tried extending the following code to work with the standard deviation:
mask = np.ones(np.shape(A))
mask.flat[A.flat > 20] = 0
What I tried was:
mask = np.ones(np.shape(A))
for i,j in A:
mask.flat[A[i,j] - 2*np.std(A) < np.mean(A) < A[i,j] + 2*np.std(A)] = 0
Which throws the error:
ValueError: too many values to unpack (expected 2)
If anyone has a better technique to statistically remove bad pixels in an image, I'm all ears. Thanks for the help!
==========
EDIT
After some trial and error, I got to a place that could help clarify my question. The new code is:
for i in A:
for j in i:
mask.flat[ j - 2*np.std(A) < np.mean(A) < j + 2*np.std(A)] = 0
This throws an error saying 'unsupported iterator index'. What I'm wanting to happen is that the for loop iterates across each element in the array, checks if it's less/greater than 2 standard deviations from the mean, and it is, sets it to zero.
Here is an approach that will be sligthly faster on larger images:
import numpy as np
import matplotlib.pyplot as plt
# generate dummy image
a = np.random.randint(1,5, (5,5))
# generate dummy outliers
a[4,4] = 20
a[2,3] = -6
# initialise mask
mask = np.ones_like(a)
# subtract mean and normalise to standard deviation.
# then any pixel in the resulting array that has an absolute value > 2
# is more than two standard deviations away from the mean
cond = (a-np.mean(a))/np.std(a)
# find those pixels and set them to zero.
mask[abs(cond) > 2] = 0
Inspection:
a
array([[ 1, 1, 3, 4, 2],
[ 1, 2, 4, 1, 2],
[ 1, 4, 3, -6, 1],
[ 2, 2, 1, 3, 2],
[ 4, 1, 3, 2, 20]])
np.round(cond, 2)
array([[-0.39, -0.39, 0.11, 0.36, -0.14],
[-0.39, -0.14, 0.36, -0.39, -0.14],
[-0.39, 0.36, 0.11, -2.12, -0.39],
[-0.14, -0.14, -0.39, 0.11, -0.14],
[ 0.36, -0.39, 0.11, -0.14, 4.32]])
mask
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 0, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 0]])
You A is three dimensional so you need to unpack using three variables like below.
A = np.array([[3,5,50],[30,2,6],[25,1,1]])
for i in A:
for j in i:
print(j)

Matrix Expression from String

Context: I'm doing a bunch of simulations that require me to implement different Hamiltonians. These Hamiltonians are just matrices, built out of Kronecker products of some common elements, with some prefactors that I have to calculate based on the system parameters. E.g, using ⊗ for the Kronecker product
H = w1(a,b,c) * sigmax ⊗ I + w2(x,y,z)*I ⊗ sigmay
I was hoping I could make a simple parser that could read in the values of a,b,c,x,y,z and an expression for the Hamiltonian and construct the necessary matrix. Sympy seems like an obvious candidate, but I can't get a matrix expression to build using strings.
from sympy import symbols,Matrix,MatrixSymbol
from sympy.physics import msigma
from sympy.physics.quantum import TensorProduct
w1,w2 = symbols('w1 w2')
X1 = MatrixSymbol('X1',4,4)
X2 = MatrixSymbol('X2',4,4)
x = msigma(1)
x_1 = TensorProduct(eye(2),x)
x_2 = TensorProduct(x,eye(2))
exp = w1*X1 + w2*X2
exp.subs([(w1,0.5),(w2,2),(X1,x_1),(X2,x_2)]).as_explicit()
will work. But, trying
exp = MatrixExpr('w1*X1+w2*X2')
or
exp = MatrixExpr(sympify('w1*X1+w2*X2'))
or even
exp = sympify('w1*X1 + w2*X2')
exp.subs([(w1,0.5),(w2,2),(X1,x_1),(X2,x_2)])
won't.
It also won't work if I change w1 or w2 to be 1x1 instances of a MatrixSymbol.
What am I doing wrong here? This is my first time using sympy so I'm very clear that I may just be missing something.
Let's look what's going on in simpler case:
exp = sympify('w1*X1'); right_exp = w1*X1
type(exp), type(right_exp)
Out[47]: (sympy.core.mul.Mul, sympy.matrices.expressions.matmul.MatMul)
Looks like simpify doesn'y understand that X1 is a matrix. So, if we mention it explicit, everything will be allright:
exp = sympify("w1*MatrixSymbol('X1',4,4)")
exp.subs([(w1,0.5),(X1,x_1)]).as_explicit()
Out[49]:
Matrix([
[ 0, 0.5, 0, 0],
[0.5, 0, 0, 0],
[ 0, 0, 0, 0.5],
[ 0, 0, 0.5, 0]])
right_exp.subs([(w1,0.5),(X1,x_1)]).as_explicit()
Out[50]:
Matrix([
[ 0, 0.5, 0, 0],
[0.5, 0, 0, 0],
[ 0, 0, 0, 0.5],
[ 0, 0, 0.5, 0]])
And the final statement:
exp = sympify("w1*MatrixSymbol('X1',4,4)+w2*MatrixSymbol('X2',4,4)")
exp.subs([(w1,0.5),(w2,2),(X1,x_1),(X2,x_2)]).as_explicit()
Out[63]:
Matrix([
[ 0, 0.5, 2, 0],
[0.5, 0, 0, 2],
[ 2, 0, 0, 0.5],
[ 0, 2, 0.5, 0]])
What's going on? If you read Basics of expressions in SymPy you can find there statement that "matrices aren’t sympifiable" and simpify interprets X1 as a symbol.
It's hard to say how to behave in another situations. There are notes in docs that warn:
Sometimes autosimplification during sympification results in
expressions that are very different in structure than what was
entered.

Count number of occurrences in repeated range

I want to count the number of occurrences/events within a range given a Numpy array of numbers.
For example, let's consider the array called arr and the result called arr via the function event_count:
import numpy as np
arr = np.array([0, 0.2, 0.3, 1, 1.5, 2])
bins = [0, 1, 2]
res = event_count(arr, bins=bins)
print(res)
>>> [3, 2, 1]
This is somewhat similar to what a historgram performs with it's bin argument, but I want to do it without creating a histogram plot. This is also similar to what bincount does, but I want a range instead of specific instances. This is also similar to this Finding Occurrences in a Range question, but I want a repeated range.
You can use a histogram without using it to plot. Here's an example using the previous code:
import numpy as np
arr = np.array([0, 0.2, 0.3, 1, 1.5, 2])
bins = [0, 1, 2, 3]
res = np.histogram(arr, bins=bins)
print(res[0])
>>> [3, 2, 1]

Wrong result in a simple ipython script

I defined a simple function that operates on 1st and 3rd rows of a 3x3 matrix:
In [1]: from numpy import *
from sympy import Symbol
In [2]: def ifsin(mat):
mat[2,:]+=1
mat[0,:]=mat[0,:]/pow(mat[2,:],mat[2,:]>0)
return mat
In [3]:Ay=Symbol('Ay')
By=Symbol('By')
q=array([[Ay,-10 ,By],[0,0.4,1],[-1 ,-1, -1]])
q
Out[3]:
array([[Ay, -10, By],
[0, 0.4, 1],
[-1, -1, -1]], dtype=object)
In [4]:V=ifsin(q)
q
Out[4]: array([[Ay, -10.0, By],
[0, 0.4, 1],
[0, 0, 0]], dtype=object)
Why was updated 3rd row of matrix q?
Also if I eval following:
In [5]:M=ifsin(V)
q
Out[5]: array([[Ay, -10.0, By],
[0, 0.4, 1],
[1, 1, 1]], dtype=object)
Again 3rd row of q is updated!!
I tried this script on "Computable" (ipad app) and ipython notebook on ubuntu 14.04 (python 2.7.6) with the same results
Thanks in advance for your help.
updated…
I changed my function to:
def ifsin(m):
return vstack([m[0,:]/pow(m[2,:]+1,(m[2,:]+1)>0),m[1,:],m[2,:]+1])
And now all works fine. Thanks!!
You are literally asking for that update on this line:
mat[2,:]+=1
In numpy syntax, this means "update row 2 of the array named mat by incrementing each value by 1".

How to fold/accumulate a numpy matrix product (dot)?

With using python library numpy, it is possible to use the function cumprod to evaluate cumulative products, e.g.
a = np.array([1,2,3,4,2])
np.cumprod(a)
gives
array([ 1, 2, 6, 24, 48])
It is indeed possible to apply this function only along one axis.
I would like to do the same with matrices (represented as numpy arrays), e.g. if I have
S0 = np.array([[1, 0], [0, 1]])
Sx = np.array([[0, 1], [1, 0]])
Sy = np.array([[0, -1j], [1j, 0]])
Sz = np.array([[1, 0], [0, -1]])
and
b = np.array([S0, Sx, Sy, Sz])
then I would like to have a cumprod-like function which gives
np.array([S0, S0.dot(Sx), S0.dot(Sx).dot(Sy), S0.dot(Sx).dot(Sy).dot(Sz)])
(This is a simple example, in reality I have potentially large matrices evaluated over n-dimensional meshgrids, so I seek for the most simple and efficient way to evaluate this thing.)
In e.g. Mathematica I would use
FoldList[Dot, IdentityMatrix[2], {S0, Sx, Sy, Sz}]
so I searched for a fold function, and all I found is an accumulate method on numpy.ufuncs. To be honest, I know that I am probably doomed because an attempt at
np.core.umath_tests.matrix_multiply.accumulate(np.array([pauli_0, pauli_x, pauli_y, pauli_z]))
as mentioned in a numpy mailing list yields the error
Reduction not defined on ufunc with signature
Do you have an idea how to (efficiently) do this kind of calculation ?
Thanks in advance.
As food for thought, here are 3 ways of evaluating the 3 sequential dot products:
With the normal Python reduce (which could also be written as a loop)
In [118]: reduce(np.dot,[S0,Sx,Sy,Sz])
array([[ 0.+1.j, 0.+0.j],
[ 0.+0.j, 0.+1.j]])
The einsum equivalent
In [119]: np.einsum('ij,jk,kl,lm',S0,Sx,Sy,Sz)
The einsum index expression looks like a sequence of operations, but it is actually evaluated as a 5d product with summation on 3 axes. In the C code this is done with an nditer and strides, but the effect is as follows:
In [120]: np.sum(S0[:,:,None,None,None] * Sx[None,:,:,None,None] *
Sy[None,None,:,:,None] * Sz[None,None,None,:,:],(1,2,3))
In [127]: np.prod([S0[:,:,None,None,None], Sx[None,:,:,None,None],
Sy[None,None,:,:,None], Sz[None,None,None,:,:]]).sum((1,2,3))
A while back while creating a patch from np.einsum I translated that C code to Python, and also wrote a Cython sum-of-products function(s). This code is on github at
https://github.com/hpaulj/numpy-einsum
einsum_py.py is the Python einsum, with some useful debugging output
sop.pyx is the Cython code, which is compiled to sop.so.
Here's how it could be used for part of your problem. I'm skipping the Sy array since my sop is not coded for complex numbers (but that could be changed).
import numpy as np
import sop
import einsum_py
S0 = np.array([[1., 0], [0, 1]])
Sx = np.array([[0., 1], [1, 0]])
Sz = np.array([[1., 0], [0, -1]])
print np.einsum('ij,jk,kl', S0, Sx, Sz)
# [[ 0. -1.] [ 1. 0.]]
# same thing, but with parsing information
einsum_py.myeinsum('ij,jk,kl', S0, Sx, Sz, debug=True)
"""
{'max_label': 108, 'min_label': 105, 'nop': 3,
'shapes': [(2, 2), (2, 2), (2, 2)],
'strides': [(16, 8), (16, 8), (16, 8)],
'ndim_broadcast': 0, 'ndims': [2, 2, 2], 'num_labels': 4,
....
op_axes [[0, -1, 1, -1], [-1, -1, 0, 1], [-1, 1, -1, 0], [0, 1, -1, -1]]
"""
# take op_axes (for np.nditer) from this debug output
op_axes = [[0, -1, 1, -1], [-1, -1, 0, 1], [-1, 1, -1, 0], [0, 1, -1, -1]]
w = sop.sum_product_cy3([S0,Sx,Sz], op_axes)
print w
As written sum_product_cy3 cannot take an arbitrary number of ops. Plus the iteration space increases with each op and index. But I can imagine calling it repeatedly, either at the Cython level, or from Python. I think it has potential for being faster than repeat(dot...) for lots of small arrays.
A condensed version of the Cython code is:
def sum_product_cy3(ops, op_axes, order='K'):
#(arr, axis=None, out=None):
cdef np.ndarray[double] x, y, z, w
cdef int size, nop
nop = len(ops)
ops.append(None)
flags = ['reduce_ok','buffered', 'external_loop'...]
op_flags = [['readonly']]*nop + [['allocate','readwrite']]
it = np.nditer(ops, flags, op_flags, op_axes=op_axes, order=order)
it.operands[nop][...] = 0
it.reset()
for x, y, z, w in it:
for i in range(x.shape[0]):
w[i] = w[i] + x[i] * y[i] * z[i]
return it.operands[nop]

Categories

Resources