Nested numpy operations - python

I have a function like this:
def foo(v, w):
return sum(np.exp(v/w))
Where v in the beginning is a numpy array and w a number. Now I want to plot the value of this function for more values of w, so I need a function that works for different sizes of vectors.
My solution for now is the obvious one
r = []
for e in w:
r.append(foo(v, e))
but I wonder if there is a better way to do it. Also, I want to stay low on memory, so I need to avoid create a big matrix, then applying the function to every value and sum over the columns (the length of v is more than 5e+4 and the length of w is 1e+3).
Thanks

If you cannot determine an upper bound for the length of v and ensure that you don't exceed the memory requirements, I think you will have to stay with your solution.
If you can determine an upper bound for length of v and meet your memory requirements using a Mx1000 array, you can do this.
import numpy as np
v = np.array([1,2,3,4,5])
w = np.array([10.,5.])
c = v / w[:, np.newaxis]
d = np.exp(c)
e = d.sum(axis = 1)
>>>
>>> v
array([1, 2, 3, 4, 5])
>>> w
array([ 10., 5.])
>>> c
array([[ 0.1, 0.2, 0.3, 0.4, 0.5],
[ 0.2, 0.4, 0.6, 0.8, 1. ]])
>>> d
array([[ 1.10517092, 1.22140276, 1.34985881, 1.4918247 , 1.64872127],
[ 1.22140276, 1.4918247 , 1.8221188 , 2.22554093, 2.71828183]])
>>> e
array([ 6.81697845, 9.47916901])
>>>

Related

How to divide an array in several sections?

I have an array with approximately 12000 length, something like array([0.3, 0.6, 0.3, 0.5, 0.1, 0.9, 0.4...]). Also, I have a column in a dataframe that provides values like 2,3,7,3,2,7.... The length of the column is 48, and the sum of those values is 36.
I want to distribute the values, which means the 12000 lengths of array is distributed by specific every value. For example, the first value in that column( = 2) gets its own array of 12000*(2/36) (maybe [0.3, 0.6, 0.3]), and the second value ( = 3) gets its array of 12000*(3/36), and its value continues after the first value(something like [0.5, 0.1, 0.9, 0.4]) and so on.
import pandas as pd
import numpy as np
# mock some data
a = np.random.random(12000)
df = pd.DataFrame({'col': np.random.randint(1, 5, 48)})
indices = (len(a) * df.col.to_numpy() / sum(df.col)).cumsum()
indices = np.concatenate(([0], indices)).round().astype(int)
res = []
for s, e in zip(indices[:-1], indices[1:]):
res.append(a[round(s):round(e)])
# some tests
target_pcts = df.col.to_numpy() / sum(df.col)
realized_pcts = np.array([len(sl) / len(a) for sl in res])
diffs = target_pcts / realized_pcts
assert 0.99 < np.min(diffs) and np.max(diffs) < 1.01
assert all(np.concatenate([*res]) == a)

numpy, pass individual arguments to np.vectorized function

if i have an array of x values and want to multiply each x value with a different coefficients and sum the. although I want this operation to happen by passing a function that handles the summation and weighting. for example if i have x, coeffs and a function, custom_weight(x, a, b, c)
x = numpy.array([1, 2, 3, 4, 5, 6])
coeffs = numpy.array([[0.1, 0.2, 3.2], [4.5, 4.0, 0.005]]
def custom_weight(x, a, b, c):
return a*x**2 + (x+b)**3 + x*c
I want x to be broadcast for each inner array of coeffs. in this case the final result
should be an array with the shape (6, 2). for the first iteration of the custom_weight function
should look like this custom_weight(x[0], *(coeffs[0])) == custom_weight(1, 0.1, 0.2, 3.2). the same happens for all the other x's 2-6. then this happens again with the x's but now using the second set of coefficients.
I do realize that I could do this manually or numpy.vectorize in a certain way... but I specifically want to use a function in that form. what I want is some function that would look like so:
numpy.the_function(x, coeffs, axis=0, custom_weight)
# the_function should take each x value and pass it to custom_weight as the first arg.
# then pass the column of coeffs (because axis=0)
# to custom_weight but it should do this by unpacking the column into the args a, b, and c
The problem is more because your custome_weight function is not designed to be vectorized. You are looking for something like this:
def custom_weight(x, coeffs):
return coeffs # x**np.array([[2,3,1]]).T
Output:
array([[ 3.5 , 8.4 , 15.9 , 27.2 , 43.5 , 66. ],
[ 8.505, 50.01 , 148.515, 328.02 , 612.525, 1026.03 ]])
So after messing around, one solution I found was by vectorizing. transposing the coefficients when passing the arguments to custom_weight and then unpacking the coefficients and it will broadcasting and np.vectorize takes care of the rest.
import numpy as np
def custom_weight(x, a, b):
return a*x**2 + b
x = np.linspace(-1, 1, 100)
coeffs = np.array([[0.2, 0.6],
[1.2, 0.1]])
vec_custom_weight = np.vectorize(custom_weight)
results = vec_custom_weight(xs[:, np.newaxis], *coeffs.T).T

How to sample in the matrix according to the probability in each cell

I tried to code the formula in pattern recognition but I can not find proper function to do the work. The problem is that I have an binary adjacency matrix A (M*N) and want to assign value 1 or 0 to each cell. Every cell has fixed probability P to be 1 and zero otherwise. I search method about sampling in python and it seems that the most methods only support sample several elements in list without considering probability. I really need help about this and any idea is appreciated.
you could use
A = (P > numpy.random.rand(4, 5)).astype(int)
Where P is your matrix of probabilities.
To make sure the probabilities are right you can test it using
P = numpy.ones((4, 5)) * 0.2
S = numpy.zeros((4, 5))
for i in range(100000):
S += (P > numpy.random.rand(4, 5)).astype(int)
print S # each element should be approximately 20000
print S.mean() # the average should be approximately 20000, too
Let's say you have your matrix of probabilities of adjacency as follows :
# Create your matrix
matrix = np.random.randint(0, 10, (3, 3))/10.
# Returns :
array([[ 0. , 0.4, 0.2],
[ 0.9, 0.7, 0.4],
[ 0.1, 0. , 0.5]])
# Now you can use np.where
threshold = 0.5
np.where(matrix<threshold, 0, 1) # you can set your threshold as you like.
# Here set to 0.5
# Returns :
array([[0, 0, 0],
[1, 1, 0],
[0, 0, 1]])

Calculate moving average in numpy array with NaNs

I am trying to calculate the moving average in a large numpy array that contains NaNs. Currently I am using:
import numpy as np
def moving_average(a,n=5):
ret = np.cumsum(a,dtype=float)
ret[n:] = ret[n:]-ret[:-n]
return ret[-1:]/n
When calculating with a masked array:
x = np.array([1.,3,np.nan,7,8,1,2,4,np.nan,np.nan,4,4,np.nan,1,3,6,3])
mx = np.ma.masked_array(x,np.isnan(x))
y = moving_average(mx).filled(np.nan)
print y
>>> array([3.8,3.8,3.6,nan,nan,nan,2,2.4,nan,nan,nan,2.8,2.6])
The result I am looking for (below) should ideally have NaNs only in the place where the original array, x, had NaNs and the averaging should be done over the number of non-NaN elements in the grouping (I need some way to change the size of n in the function.)
y = array([4.75,4.75,nan,4.4,3.75,2.33,3.33,4,nan,nan,3,3.5,nan,3.25,4,4.5,3])
I could loop over the entire array and check index by index but the array I am using is very large and that would take a long time. Is there a numpythonic way to do this?
Pandas has a lot of really nice functionality with this. For example:
x = np.array([np.nan, np.nan, 3, 3, 3, np.nan, 5, 7, 7])
# requires three valid values in a row or the resulting value is null
print(pd.Series(x).rolling(3).mean())
#output
nan,nan,nan, nan, 3, nan, nan, nan, 6.333
# only requires 2 valid values out of three for size=3 window
print(pd.Series(x).rolling(3, min_periods=2).mean())
#output
nan, nan, nan, 3, 3, 3, 4, 6, 6.3333
You can play around with the windows/min_periods and consider filling-in nulls all in one chained line of code.
I'll just add to the great answers before that you could still use cumsum to achieve this:
import numpy as np
def moving_average(a, n=5):
ret = np.cumsum(a.filled(0))
ret[n:] = ret[n:] - ret[:-n]
counts = np.cumsum(~a.mask)
counts[n:] = counts[n:] - counts[:-n]
ret[~a.mask] /= counts[~a.mask]
ret[a.mask] = np.nan
return ret
x = np.array([1.,3,np.nan,7,8,1,2,4,np.nan,np.nan,4,4,np.nan,1,3,6,3])
mx = np.ma.masked_array(x,np.isnan(x))
y = moving_average(mx)
You could create a temporary array and use np.nanmean() (new in version 1.8 if I'm not mistaken):
import numpy as np
temp = np.vstack([x[i:-(5-i)] for i in range(5)]) # stacks vertically the strided arrays
means = np.nanmean(temp, axis=0)
and put original nan back in place with means[np.isnan(x[:-5])] = np.nan
However this look redundant both in terms of memory (stacking the same array strided 5 times) and computation.
If I understand correctly, you want to create a moving average and then populate the resulting elements as nan if their index in the original array was nan.
import numpy as np
>>> inc = 5 #the moving avg increment
>>> x = np.array([1.,3,np.nan,7,8,1,2,4,np.nan,np.nan,4,4,np.nan,1,3,6,3])
>>> mov_avg = np.array([np.nanmean(x[idx:idx+inc]) for idx in range(len(x))])
# Determine indices in x that are nans
>>> nan_idxs = np.where(np.isnan(x))[0]
# Populate output array with nans
>>> mov_avg[nan_idxs] = np.nan
>>> mov_avg
array([ 4.75, 4.75, nan, 4.4, 3.75, 2.33333333, 3.33333333, 4., nan, nan, 3., 3.5, nan, 3.25, 4., 4.5, 3.])
Here's an approach using strides -
w = 5 # Window size
n = x.strides[0]
avgs = np.nanmean(np.lib.stride_tricks.as_strided(x, \
shape=(x.size-w+1,w), strides=(n,n)),1)
x_rem = np.append(x[-w+1:],np.full(w-1,np.nan))
avgs_rem = np.nanmean(np.lib.stride_tricks.as_strided(x_rem, \
shape=(w-1,w), strides=(n,n)),1)
avgs = np.append(avgs,avgs_rem)
avgs[np.isnan(x)] = np.nan
Currently bottleneck package should do the trick quite reliably and quickly. Here is slightly adjusted example from https://kwgoodman.github.io/bottleneck-doc/reference.html#bottleneck.move_mean:
>>> import bottleneck as bn
>>> a = np.array([1.0, 2.0, 3.0, np.nan, 5.0])
>>> bn.move_mean(a, window=2)
array([ nan, 1.5, 2.5, nan, nan])
>>> bn.move_mean(a, window=2, min_count=1)
array([ 1. , 1.5, 2.5, 3. , 5. ])
Note that the resulting means correspond to the last index of the window.
The package is available from Ubuntu repos, pip etc. It can operate over arbitrary axis of numpy-array etc. Besides that, it is claimed to be faster than plain-numpy implementation in many cases.

Constant lambda expressions, shape of return data

A list of lambda expressions given to me (by Sympy's lambdify), some explicitly depending on a variable x, some constant. I would like to evaluate those consistently with Numpy arrays.
When evaluating a lambda expression, e.g., lambda x: 1.0 + x**2, with a Numpy array x, the result will have the same shape as the array. If the expression happens to not explicitly contain x though, e.g., g = lambda x: 1.0, only a scalar is returned.
import numpy
f = [lambda x: 1.0 + x**2, lambda x: 1.0]
X = numpy.array([1, 2, 3])
print(f[0](X))
print(f[1](X))
returns
[ 2. 5. 10.]
1.0
Is there a way to get the shapes of the output arguments consistent?
You could use ones_like:
>>> X = numpy.array([1, 2, 3])
>>> def g(x): return numpy.ones_like(x)
>>> g(X)
array([1, 1, 1])
Note that this returns integers, not floats, because that was the input dtype; you could specify dtype=float or multiply by 1.0 if you prefer to always get floats out.
PS: It's a little odd to use a lambda and then immediately give it a name. It's like wearing a mask but handing out business cards.
PPS: back before ones_like I tended to use x*0+1 when I wanted something appropriately shaped.
I don't see the problem, just do:
import numpy as np
X = np.array([1, 2, 3])
f = lambda x: 1.0 + x**2
print(f(X))
g = lambda x: np.ones(shape=(len(X),))
print(g(X))
Which prints:
[ 2. 5. 10.]
[ 1. 1. 1.]
Notice that using np.ones(shape=(len(X),)) is the same that using np.ones_like(X)
Use ones_like:
g = lambda x: np.ones_like(x) * 1.0
There's also this slightly hackier solution:
g = lambda x: 1.0 + (x*0)
You seem to want an array of ones:
>>> import numpy
>>> numpy.ones(3)
array([ 1., 1., 1.])
If you want to set scalars, it's easy to do so
g = lambda x: numpy.ones(shape=x.shape) * 2
g(X)
returns
array([ 2., 2., 2.])
So for an arbitrary array:
g = lambda x: numpy.ones(shape=x.shape) * 1
n = numpy.array([1,2,3,4,5])
g(n) is
array([ 1., 1., 1., 1., 1.])

Categories

Resources