Nested numpy operations

Nested numpy operations - python

I have a function like this:
def foo(v, w):
return sum(np.exp(v/w))
Where v in the beginning is a numpy array and w a number. Now I want to plot the value of this function for more values of w, so I need a function that works for different sizes of vectors.
My solution for now is the obvious one
r = []
for e in w:
r.append(foo(v, e))
but I wonder if there is a better way to do it. Also, I want to stay low on memory, so I need to avoid create a big matrix, then applying the function to every value and sum over the columns (the length of v is more than 5e+4 and the length of w is 1e+3).
Thanks

If you cannot determine an upper bound for the length of v and ensure that you don't exceed the memory requirements, I think you will have to stay with your solution.
If you can determine an upper bound for length of v and meet your memory requirements using a Mx1000 array, you can do this.
import numpy as np
v = np.array([1,2,3,4,5])
w = np.array([10.,5.])
c = v / w[:, np.newaxis]
d = np.exp(c)
e = d.sum(axis = 1)
>>>
>>> v
array([1, 2, 3, 4, 5])
>>> w
array([ 10., 5.])
>>> c
array([[ 0.1, 0.2, 0.3, 0.4, 0.5],
[ 0.2, 0.4, 0.6, 0.8, 1. ]])
>>> d
array([[ 1.10517092, 1.22140276, 1.34985881, 1.4918247 , 1.64872127],
[ 1.22140276, 1.4918247 , 1.8221188 , 2.22554093, 2.71828183]])
>>> e
array([ 6.81697845, 9.47916901])
>>>

Related

How to divide an array in several sections?

I have an array with approximately 12000 length, something like array([0.3, 0.6, 0.3, 0.5, 0.1, 0.9, 0.4...]). Also, I have a column in a dataframe that provides values like 2,3,7,3,2,7.... The length of the column is 48, and the sum of those values is 36.
I want to distribute the values, which means the 12000 lengths of array is distributed by specific every value. For example, the first value in that column( = 2) gets its own array of 12000*(2/36) (maybe [0.3, 0.6, 0.3]), and the second value ( = 3) gets its array of 12000*(3/36), and its value continues after the first value(something like [0.5, 0.1, 0.9, 0.4]) and so on.

import pandas as pd
import numpy as np
# mock some data
a = np.random.random(12000)
df = pd.DataFrame({'col': np.random.randint(1, 5, 48)})
indices = (len(a) * df.col.to_numpy() / sum(df.col)).cumsum()
indices = np.concatenate(([0], indices)).round().astype(int)
res = []
for s, e in zip(indices[:-1], indices[1:]):
res.append(a[round(s):round(e)])
# some tests
target_pcts = df.col.to_numpy() / sum(df.col)
realized_pcts = np.array([len(sl) / len(a) for sl in res])
diffs = target_pcts / realized_pcts
assert 0.99 < np.min(diffs) and np.max(diffs) < 1.01
assert all(np.concatenate([*res]) == a)

numpy, pass individual arguments to np.vectorized function

if i have an array of x values and want to multiply each x value with a different coefficients and sum the. although I want this operation to happen by passing a function that handles the summation and weighting. for example if i have x, coeffs and a function, custom_weight(x, a, b, c)
x = numpy.array([1, 2, 3, 4, 5, 6])
coeffs = numpy.array([[0.1, 0.2, 3.2], [4.5, 4.0, 0.005]]
def custom_weight(x, a, b, c):
return a*x**2 + (x+b)**3 + x*c
I want x to be broadcast for each inner array of coeffs. in this case the final result
should be an array with the shape (6, 2). for the first iteration of the custom_weight function
should look like this custom_weight(x[0], *(coeffs[0])) == custom_weight(1, 0.1, 0.2, 3.2). the same happens for all the other x's 2-6. then this happens again with the x's but now using the second set of coefficients.
I do realize that I could do this manually or numpy.vectorize in a certain way... but I specifically want to use a function in that form. what I want is some function that would look like so:
numpy.the_function(x, coeffs, axis=0, custom_weight)
# the_function should take each x value and pass it to custom_weight as the first arg.
# then pass the column of coeffs (because axis=0)
# to custom_weight but it should do this by unpacking the column into the args a, b, and c

The problem is more because your custome_weight function is not designed to be vectorized. You are looking for something like this:
def custom_weight(x, coeffs):
return coeffs # x**np.array([[2,3,1]]).T
Output:
array([[ 3.5 , 8.4 , 15.9 , 27.2 , 43.5 , 66. ],
[ 8.505, 50.01 , 148.515, 328.02 , 612.525, 1026.03 ]])

So after messing around, one solution I found was by vectorizing. transposing the coefficients when passing the arguments to custom_weight and then unpacking the coefficients and it will broadcasting and np.vectorize takes care of the rest.
import numpy as np
def custom_weight(x, a, b):
return a*x**2 + b
x = np.linspace(-1, 1, 100)
coeffs = np.array([[0.2, 0.6],
[1.2, 0.1]])
vec_custom_weight = np.vectorize(custom_weight)
results = vec_custom_weight(xs[:, np.newaxis], *coeffs.T).T

How to sample in the matrix according to the probability in each cell

I tried to code the formula in pattern recognition but I can not find proper function to do the work. The problem is that I have an binary adjacency matrix A (M*N) and want to assign value 1 or 0 to each cell. Every cell has fixed probability P to be 1 and zero otherwise. I search method about sampling in python and it seems that the most methods only support sample several elements in list without considering probability. I really need help about this and any idea is appreciated.

you could use
A = (P > numpy.random.rand(4, 5)).astype(int)
Where P is your matrix of probabilities.
To make sure the probabilities are right you can test it using
P = numpy.ones((4, 5)) * 0.2
S = numpy.zeros((4, 5))
for i in range(100000):
S += (P > numpy.random.rand(4, 5)).astype(int)
print S # each element should be approximately 20000
print S.mean() # the average should be approximately 20000, too

Let's say you have your matrix of probabilities of adjacency as follows :
# Create your matrix
matrix = np.random.randint(0, 10, (3, 3))/10.
# Returns :
array([[ 0. , 0.4, 0.2],
[ 0.9, 0.7, 0.4],
[ 0.1, 0. , 0.5]])
# Now you can use np.where
threshold = 0.5
np.where(matrix<threshold, 0, 1) # you can set your threshold as you like.
# Here set to 0.5
# Returns :
array([[0, 0, 0],
[1, 1, 0],
[0, 0, 1]])

Calculate moving average in numpy array with NaNs

I am trying to calculate the moving average in a large numpy array that contains NaNs. Currently I am using:
import numpy as np
def moving_average(a,n=5):
ret = np.cumsum(a,dtype=float)
ret[n:] = ret[n:]-ret[:-n]
return ret[-1:]/n
When calculating with a masked array:
x = np.array([1.,3,np.nan,7,8,1,2,4,np.nan,np.nan,4,4,np.nan,1,3,6,3])
mx = np.ma.masked_array(x,np.isnan(x))
y = moving_average(mx).filled(np.nan)
print y
>>> array([3.8,3.8,3.6,nan,nan,nan,2,2.4,nan,nan,nan,2.8,2.6])
The result I am looking for (below) should ideally have NaNs only in the place where the original array, x, had NaNs and the averaging should be done over the number of non-NaN elements in the grouping (I need some way to change the size of n in the function.)
y = array([4.75,4.75,nan,4.4,3.75,2.33,3.33,4,nan,nan,3,3.5,nan,3.25,4,4.5,3])
I could loop over the entire array and check index by index but the array I am using is very large and that would take a long time. Is there a numpythonic way to do this?

Pandas has a lot of really nice functionality with this. For example:
x = np.array([np.nan, np.nan, 3, 3, 3, np.nan, 5, 7, 7])
# requires three valid values in a row or the resulting value is null
print(pd.Series(x).rolling(3).mean())
#output
nan,nan,nan, nan, 3, nan, nan, nan, 6.333
# only requires 2 valid values out of three for size=3 window
print(pd.Series(x).rolling(3, min_periods=2).mean())
#output
nan, nan, nan, 3, 3, 3, 4, 6, 6.3333
You can play around with the windows/min_periods and consider filling-in nulls all in one chained line of code.

I'll just add to the great answers before that you could still use cumsum to achieve this:
import numpy as np
def moving_average(a, n=5):
ret = np.cumsum(a.filled(0))
ret[n:] = ret[n:] - ret[:-n]
counts = np.cumsum(~a.mask)
counts[n:] = counts[n:] - counts[:-n]
ret[~a.mask] /= counts[~a.mask]
ret[a.mask] = np.nan
return ret
x = np.array([1.,3,np.nan,7,8,1,2,4,np.nan,np.nan,4,4,np.nan,1,3,6,3])
mx = np.ma.masked_array(x,np.isnan(x))
y = moving_average(mx)

You could create a temporary array and use np.nanmean() (new in version 1.8 if I'm not mistaken):
import numpy as np
temp = np.vstack([x[i:-(5-i)] for i in range(5)]) # stacks vertically the strided arrays
means = np.nanmean(temp, axis=0)
and put original nan back in place with means[np.isnan(x[:-5])] = np.nan
However this look redundant both in terms of memory (stacking the same array strided 5 times) and computation.

If I understand correctly, you want to create a moving average and then populate the resulting elements as nan if their index in the original array was nan.
import numpy as np
>>> inc = 5 #the moving avg increment
>>> x = np.array([1.,3,np.nan,7,8,1,2,4,np.nan,np.nan,4,4,np.nan,1,3,6,3])
>>> mov_avg = np.array([np.nanmean(x[idx:idx+inc]) for idx in range(len(x))])
# Determine indices in x that are nans
>>> nan_idxs = np.where(np.isnan(x))[0]
# Populate output array with nans
>>> mov_avg[nan_idxs] = np.nan
>>> mov_avg
array([ 4.75, 4.75, nan, 4.4, 3.75, 2.33333333, 3.33333333, 4., nan, nan, 3., 3.5, nan, 3.25, 4., 4.5, 3.])

Here's an approach using strides -
w = 5 # Window size
n = x.strides[0]
avgs = np.nanmean(np.lib.stride_tricks.as_strided(x, \
shape=(x.size-w+1,w), strides=(n,n)),1)
x_rem = np.append(x[-w+1:],np.full(w-1,np.nan))
avgs_rem = np.nanmean(np.lib.stride_tricks.as_strided(x_rem, \
shape=(w-1,w), strides=(n,n)),1)
avgs = np.append(avgs,avgs_rem)
avgs[np.isnan(x)] = np.nan

Currently bottleneck package should do the trick quite reliably and quickly. Here is slightly adjusted example from https://kwgoodman.github.io/bottleneck-doc/reference.html#bottleneck.move_mean:
>>> import bottleneck as bn
>>> a = np.array([1.0, 2.0, 3.0, np.nan, 5.0])
>>> bn.move_mean(a, window=2)
array([ nan, 1.5, 2.5, nan, nan])
>>> bn.move_mean(a, window=2, min_count=1)
array([ 1. , 1.5, 2.5, 3. , 5. ])
Note that the resulting means correspond to the last index of the window.
The package is available from Ubuntu repos, pip etc. It can operate over arbitrary axis of numpy-array etc. Besides that, it is claimed to be faster than plain-numpy implementation in many cases.

Constant lambda expressions, shape of return data

A list of lambda expressions given to me (by Sympy's lambdify), some explicitly depending on a variable x, some constant. I would like to evaluate those consistently with Numpy arrays.
When evaluating a lambda expression, e.g., lambda x: 1.0 + x**2, with a Numpy array x, the result will have the same shape as the array. If the expression happens to not explicitly contain x though, e.g., g = lambda x: 1.0, only a scalar is returned.
import numpy
f = [lambda x: 1.0 + x**2, lambda x: 1.0]
X = numpy.array([1, 2, 3])
print(f[0](X))
print(f[1](X))
returns
[ 2. 5. 10.]
1.0
Is there a way to get the shapes of the output arguments consistent?

You could use ones_like:
>>> X = numpy.array([1, 2, 3])
>>> def g(x): return numpy.ones_like(x)
>>> g(X)
array([1, 1, 1])
Note that this returns integers, not floats, because that was the input dtype; you could specify dtype=float or multiply by 1.0 if you prefer to always get floats out.
PS: It's a little odd to use a lambda and then immediately give it a name. It's like wearing a mask but handing out business cards.
PPS: back before ones_like I tended to use x*0+1 when I wanted something appropriately shaped.

I don't see the problem, just do:
import numpy as np
X = np.array([1, 2, 3])
f = lambda x: 1.0 + x**2
print(f(X))
g = lambda x: np.ones(shape=(len(X),))
print(g(X))
Which prints:
[ 2. 5. 10.]
[ 1. 1. 1.]
Notice that using np.ones(shape=(len(X),)) is the same that using np.ones_like(X)

Use ones_like:
g = lambda x: np.ones_like(x) * 1.0
There's also this slightly hackier solution:
g = lambda x: 1.0 + (x*0)

You seem to want an array of ones:
>>> import numpy
>>> numpy.ones(3)
array([ 1., 1., 1.])
If you want to set scalars, it's easy to do so
g = lambda x: numpy.ones(shape=x.shape) * 2
g(X)
returns
array([ 2., 2., 2.])
So for an arbitrary array:
g = lambda x: numpy.ones(shape=x.shape) * 1
n = numpy.array([1,2,3,4,5])
g(n) is
array([ 1., 1., 1., 1., 1.])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Nested numpy operations - python

Related

How to divide an array in several sections?

numpy, pass individual arguments to np.vectorized function

How to sample in the matrix according to the probability in each cell

Calculate moving average in numpy array with NaNs

Constant lambda expressions, shape of return data

Categories

Resources