Computing the mean of an array considering only some indices - python

I have two 2d arrays, one containing float values, one containing bool. I want to create an array containing the mean values of the first matrix for each column considering only the values corresponding to False in the second matrix.
For example:
A = [[1 3 5]
[2 4 6]
[3 1 0]]
B = [[True False False]
[False False False]
[True True False]]
result = [2, 3.5, 3.67]

Where B is False, keep the value of A, make it NaN otherwise and then use the nanmean function which ignores NaN's for operations.
np.nanmean(np.where(~B, A, np.nan), axis=0)
>>> array([2. , 3.5 , 3.66666667])

Using numpy.mean using where argument to specify elements to include in the mean.
np.mean(A, where = ~B, axis = 0)
>>> [2. 3.5 3.66666667]

A = [[1, 3, 5],
[2, 4, 6],
[3, 1, 0]]
B = [[True, False, False],
[False, False, False],
[True, True, False]]
sums = [0]*len(A[0])
amounts = [0]*len(A[0])
for i in range(0, len(A)):
for j in range(0, len(A[0])):
sums[j] = sums[j] + (A[i][j] if not B[i][j] else 0)
amounts[j] = amounts[j] + (1 if not B[i][j] else 0)
result = [sums[i]/amounts[i] for i in range(0, len(sums))]
print(result)

There may be some fancy numpy trick for this, but I think using a list comprehension to construct a new array is the most straightforward.
result = np.array([a_col[~b_col].mean() for a_col, b_col in zip(A.T,B.T)])
To follow better, this is what the line does expanded out:
result=[]
for i in range(len(A)):
new_col = A[:,i][~B[:,i]]
result.append(new_col.mean())

You could also use a masked array:
import numpy as np
result = np.ma.array(A, mask=B).mean(axis=0).filled(fill_value=0)
# Output:
# array([2. , 3.5 , 3.66666667])
which has the advantage of being able to supply a fill_value for when every element in some column in B is True.

Related

how to implement this array algorithm in a more efficient way?

Assuming I have n = 3 lists of same length for example:
R1 = [7,5,8,6,0,6,7]
R2 = [8,0,2,2,0,2,2]
R3 = [1,7,5,9,0,9,9]
I need to find the first index t that verifies the n = 3 following conditions for a period p = 2.
Edit: the meaning of period p is the number of consecutive "boxes".
R1[t] >= 5, R1[t+1] >= 5. Here t +p -1 = t+1, we need to only verify for two boxes t and t+1. If p was equal to 3 we will need to verify for t, t+1 and t+2. Note that It's always the same number for which we test, we always test if it's greater than 5 for every index. The condition is always the same for all the "boxes".
R2[t] >= 2, R2[t+1] >= 2
R3[t] >= 9, R3[t+1] >= 9
In total there is 3 * p conditions.
Here the t I am looking for is 5 (indexing is starting from 0).
The basic way to do this is by looping on all the indexes using a for loop. If the condition is found for some index t we store it in some local variable temp and we verify the conditions still hold for every element whose index is between t+1 and t+p -1. If while checking, we find an index that does not satisfy a condition, we forget about the temp and we keep going.
What is the most efficient way to do this in Python if I have large lists (like of 10000 elements)? Is there a more efficient way than the for loop?
Since all your conditions are the same (>=), we could leverage this.
This solution will work for any number of conditions and any size of analysis window, and no for loop is used.
You have an array:
>>> R = np.array([R1, R2, R3]).T
>>> R
array([[7, 8, 1],
[5, 0, 7],
[8, 2, 5],
[6, 2, 9],
[0, 0, 0],
[6, 2, 9],
[7, 2, 9]]
and you have thresholds:
>>> thresholds = [5, 2, 9]
So you can check where the conditions are met:
>>> R >= thresholds
array([[ True, True, False],
[ True, False, False],
[ True, True, False],
[ True, True, True],
[False, False, False],
[ True, True, True],
[ True, True, True]])
And where they all met at the same time:
>>> R_cond = np.all(R >= thresholds, axis=1)
>>> R_cond
array([False, False, False, True, False, True, True])
From there, you want the conditions to be met for a given window.
We'll use the fact that booleans can sum together, and convolution to apply the window:
>>> win_size = 2
>>> R_conv = np.convolve(R_cond, np.ones(win_size), mode="valid")
>>> R_conv
array([0., 0., 1., 1., 1., 2.])
The resulting array will have values equal to win_size at the indices where all conditions are met on the window range.
So let's retrieve the first of those indices:
>>> index = np.where(R_conv == win_size)[0][0]
>>> index
5
If such an index doesn't exist, it will raise an IndexError, I'm letting you handle that.
So, as a one-liner function, it gives:
def idx_conditions(arr, thresholds, win_size, condition):
return np.where(
np.convolve(
np.all(condition(arr, thresholds), axis=1),
np.ones(win_size),
mode="valid"
)
== win_size
)[0][0]
I added the condition as an argument to the function, to be more general.
>>> from operator import ge
>>> idx_conditions(R, thresholds, win_size, ge)
5
This could be a way:
R1 = [7,5,8,6,0,6,7]
R2 = [8,0,2,2,0,2,2]
R3 = [1,7,5,9,0,9,9]
for i,inext in zip(range(len(R1)),range(len(R1))[1:]):
if (R1[i]>=5 and R1[inext]>=5)&(R2[i]>=2 and R2[inext]>=2)&(R3[i]>=9 and R3[inext]>=9):
print(i)
Output:
5
Edit: Generalization could be:
def foo(ls,conditions):
index=0
for i,inext in zip(range(len(R1)),range(len(R1))[1:]):
if all((ls[j][i]>=conditions[j] and ls[j][inext]>=conditions[j]) for j in range(len(ls))):
index=i
return index
R1 = [7,5,8,6,0,6,7]
R2 = [8,0,2,2,0,2,2]
R3 = [1,7,5,9,0,9,9]
R4 = [1,7,5,9,0,1,1]
R5 = [1,7,5,9,0,3,3]
conditions=[5,2,9,1,3]
ls=[R1,R2,R3,R4,R5]
print(foo(ls,conditions))
Output:
5
And, maybe if the arrays match the conditions multiple times, you could return a list of the indexes:
def foo(ls,conditions):
index=[]
for i,inext in zip(range(len(R1)),range(len(R1))[1:]):
if all((ls[j][i]>=conditions[j] and ls[j][inext]>=conditions[j]) for j in range(len(ls))):
print(i)
index.append(i)
return index
R1 = [6,7,8,6,0,6,7]
R2 = [2,2,2,2,0,2,2]
R3 = [9,9,5,9,0,9,9]
R4 = [1,1,5,9,0,1,1]
R5 = [3,3,5,9,0,3,3]
conditions=[5,2,9,1,3]
ls=[R1,R2,R3,R4,R5]
print(foo(ls,conditions))
Output:
[0,5]
Here is a solution using numpy ,without for loops:
import numpy as np
R1 = np.array([7,5,8,6,0,6,7])
R2 = np.array([8,0,2,2,0,2,2])
R3 = np.array([1,7,5,9,0,9,9])
a = np.logical_and(np.logical_and(R1>=5,R2>=2),R3>=9)
np.where(np.logical_and(a[:-1],a[1:]))[0].item()
ouput
5
Edit:
Generalization
Say you have a list of lists R and a list of conditions c:
R = [[7,5,8,6,0,6,7],
[8,0,2,2,0,2,2],
[1,7,5,9,0,9,9]]
c = [5,2,9]
First we convert them to numpy arrays. the reshape(-1,1) converts c to a column matrix so that we can use pythons broadcasting feature in the >= operator
R = np.array(R)
c = np.array(c).reshape(-1,1)
R>=c
output:
array([[ True, True, True, True, False, True, True],
[ True, False, True, True, False, True, True],
[False, False, False, True, False, True, True]])
then we perform logical & operation between all rows using reduce function
a = np.logical_and.reduce(R>=c)
a
output:
array([False, False, False, True, False, True, True])
next we create two arrays by removing first and last element of a and perform a logical & between them which shows which two subsequent elements satisfied the conditions in all lists:
np.logical_and(a[:-1],a[1:])
output:
array([False, False, False, False, False, True])
now np.where just shows the index of the True element
np.where(np.logical_and(a[:-1],a[1:]))[0].item()
output:
5

All boolean combinations from 2 numpy arrays

Is there an existing function in numpy that takes 2 numpy arrays (x,y) and returns a boolean matrix for each i,j (x[i]>y[j])
For example, let x = [3, 4 ,5] and y = [1, 2, 3] and I want
res = [ [True, True, False],
[True, True, True],
[True, True, True] ]
You don't need a function here, just array broadcasting can work if you shape your arrays properly. I think you want this approach, which makes x a column vector and y a row vector:
x = np.array([3,4,5])
y = np.array([1,2,3])
res = x[:,None] > y[None,:]
Using numpy, you can cast your x and y list to arrays like so:x = np.array([3,4,5]) y=np.array([1,2,3]) and then numpy does elementwise comparisons by simply doing: print(x > y)

Numpy find agreement between columns in array

I have labels from 'n' different people who rated 'm' items (either 0 or 1), so an m x n array. For example, 3 people rating 4 items:
arr = np.asarray([[1,1,1], [1,1,0], [0,0,0], [0, 1, 0]])
print(arr)
>>>
[[1 1 1]
[1 1 0]
[0 0 0]
[0 1 0]]
I want to see on which items everyone "agreed", i.e. all values in the row are the same. In this example the answer is [True, False, True, False]. I got it working using this:
np.logical_or(arr.sum(axis=1) == n, arr.sum(axis=1) == 0)
Kind of hacky. What's a better way of doing this?
One alternative would be to calculate the diff along the rows and then check whether all the diffs are equal to 0; This will make sure all elements in a row are the same (and can be different from 0 and 1):
(np.diff(arr, axis=1) == 0).all(axis=1)
# array([ True, False, True, False], dtype=bool)
Or if you have only 0s and 1s, then:
(arr == 1).all(1) | (arr == 0).all(1)
# array([ True, False, True, False], dtype=bool)
arr.all(1) | ~arr.any(1)
# array([ True, False, True, False], dtype=bool)
I think len(set(.)) is basically the is_uniform function that you are looking for:
[len(set(x)) == 1 for x in arr]
Note that this solution is very general, it does not require:
the same number of people voted on each item
values to be numeric or any particular type
additional package on top of core python
Or use list comprehension making elements that there length equals to the count of the first element of i (so basically see if they are all the same value in i), if they don't match with the condition, make them False instead:
print([len(i)==i.tolist().count(i[0]) for i in arr])
Output:
[True, False, True, False]

Python how to set values in a matrix given an array element of which representing the column of the matrix

Suppose I have a N x N matrix M and a N elements array A. A[i] represents M[i, A[i]] entry in M. How do I quickly set corresponding entry in M to value 1 given the array A?
By using numpy so far, what I tried is:
M[0:A.shape[0], A]=1
But this does not work and I don't want to run into a loop which is kind of costy when N is big.
You can create a mask, than use it to set all the values to 1. In this case (for a 4x4 matrix and A=[1,3,2,0]), the mask can be created by:
A = np.array([1, 3, 2, 0])
mask = np.zeros((4, 4), int)
np.fill_diagonal(mask, 1)
mask = mask[A, :] > 0
Which produces mask:
[[False True False False]
[False False False True]
[False False True False]
[ True False False False]]
You can then easily apply the mask to a 4x4 matrix M and set the corresponding values to 1.
np.random.seed(42)
M = np.random.uniform(0, 1, 16).reshape(4, 4)
M[mask] = 1
The results is:
[[ 0.37454012 1. 0.73199394 0.59865848]
[ 0.15601864 0.15599452 0.05808361 1. ]
[ 0.60111501 0.70807258 1. 0.96990985]
[ 1. 0.21233911 0.18182497 0.18340451
Or you can make it all with a simple for loop, which actually produces the same.
A = np.array([1, 3, 2, 0])
np.random.seed(42)
M = np.random.uniform(0, 1, 16).reshape(4, 4)
M[mask] = 1
for i, a in enumerate(A):
M[i, a] = 1

How to find negative elements in a multidimensional array? Use .any() .all() correctly

I have a numpy array arr with negative double elements. It is shaped (1000,1000). As the elements are complex, we use arr.real to only evaluate the real part.
I first tried
for i in arr.real:
if i < 0:
print(i)
This gave the following ValueError:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
However, if I try
for i in arr.real:
if i.any() < 0:
print(i)
or
for i in arr.real:
if i.all() < 0:
print(i)
there is no output. Nothing is printed, even though negative values do exist.
How do I rectify this? What am I misunderstanding?
EDIT:
for i in arr.real:
print(i[i<0])
does work. However, how does one search for two conditions? For example,
i < 0 and i > -1e-12
Since arr is 2d, iteration gives you the rows, not the elements.
Make a sample array:
In [347]: arr=np.arange(16).reshape(4,4)-10
In [348]: arr
Out[348]:
array([[-10, -9, -8, -7],
[ -6, -5, -4, -3],
[ -2, -1, 0, 1],
[ 2, 3, 4, 5]])
Iterate with some prints:
In [350]: for i in arr:
print(i)
print(i<0)
print((i<0).any())
.....:
result:
[-10 -9 -8 -7]
[ True True True True]
True
[-6 -5 -4 -3]
[ True True True True]
True
[-2 -1 0 1]
[ True True False False]
True
[2 3 4 5]
[False False False False]
False
The ValueError results when you try to use that boolean array np.array([True, True, False, False]) in an if statement. Applying all or any to the array reduces it to one scalar True/False value, which works in the if statement.
You can apply the negative test to the whole array, and apply all/any to rows (or columns) - without iteration:
In [351]: arr<0
Out[351]:
array([[ True, True, True, True],
[ True, True, True, True],
[ True, True, False, False],
[False, False, False, False]], dtype=bool)
In [352]: (arr<0).any(axis=1)
Out[352]: array([ True, True, True, False], dtype=bool)
In [353]: (arr<0).all(axis=1)
Out[353]: array([ True, True, False, False], dtype=bool)
to get the non-negative values in the array, you can use this boolean mask (or its negative):
In [354]: arr[arr>=0]
Out[354]: array([0, 1, 2, 3, 4, 5])
Because there are different numbers of valid values in each row it can't give you a 2d array.
But you can go back to iteration to get a list of values for each row. Here I use a list comprehension to do the iteration.
In [355]: [a[a>=0] for a in arr]
Out[355]:
[array([], dtype=int32),
array([], dtype=int32),
array([0, 1]),
array([2, 3, 4, 5])]
Have you tried lambda expressions?
for k in arr:
print filter(lambda x: x < 0, k)
if your "array" is a dictionary with tuple keys you can use lambda alike this:
d = {(0,0):-3,(0,1):3,(0,2):-3.7,(0,3):0,
(1,0):30,(1,1):-12,(1,2):-0.1,(1,3):2.5,}
keys = filter(lambda x: d[x] < 0, [k for k in d])
print [d[keys[n]] for n in range(0,len(keys))]
perhaps to actually answer your question this will help,
import itertools as it
aK = it.product(range(0,len(arr)),range(0,len(arr)))
negKeys = filter(lambda x: arr[x[0]][x[1]] < 0, [k for k in ak])
negVals = [arr[k[0]][k[1]] for k in negKeys]
It seems one way to do this is:
for row in arr.real:
for i in row:
if (i<0 and i<-1e-10):
print(i)

Categories

Resources