Loop over elements in a numpy array of arbitrary dimension

Loop over elements in a numpy array of arbitrary dimension - python

I have a code like the following:
def infball_proj(mu, beta):
newmu = np.zeros(mu.shape)
if len(mu.shape) == 2:
for i in range(mu.shape[0]):
for j in range(mu.shape[1]):
if np.abs(mu[i,j]) > beta:
newmu[i,j] = np.sign(mu[i,j]) * beta
else:
newmu[i,j] = mu[i,j]
return newmu
elif len(mu.shape) == 1:
for i in range(mu.shape[0]):
if np.abs(mu[i]) > beta:
newmu[i] = np.sign(mu[i]) * beta
else:
newmu[i] = mu[i]
return newmu
Is there a smarter way to do this so I don't have to write the 2 different cases? It would be nice if I could have a version that scales to an arbitrary dimension (i.e. numbers of axes).

Something like this should do the job:
newmu = np.where(np.abs(mu) > beta, np.sign(mu) * beta, mu)
Or, if I get the logic right,
newmu = np.minimum(np.abs(mu), beta) * np.sign(mu)

mu[np.abs(mu)>beta] = np.sign(mu[np.abs(mu)>beta]) * beta
np.abs(mu)>beta will create a boolean array which can then be used for boolean indexing.
The LHS mu[np.abs(mu)>beta] will return a view of the elements being selected by the boolean indexing and can be assigned to the value your want, that is, the RHS.
REMEMBER: Try to avoid for-loop of NumPy array as it is very inefficient.

Related

How can this function be vectorized?

I have a NumPy array with the following properties:
shape: (9986080, 2)
dtype: np.float32
I have a method that loops over the range of the array, performs an operation and then inputs result to new array:
def foo(arr):
new_arr = np.empty(arr.size, dtype=np.uint64)
for i in range(arr.size):
x, y = arr[i]
e, n = ''
if x < 0:
e = '1'
else:
w = '2'
if y > 0:
n = '3'
else:
s = '4'
new_arr[i] = int(f'{abs(x)}{e}{abs(y){n}'.replace('.', ''))

I agree with Iguananaut's comment that this data structure seems a bit odd. My biggest problem with it is that it is really tricky to try and vectorize the putting together of integers in a string and then re-converting that to an integer. Still, this will certainly help speed up the function:
def foo(arr):
x_values = arr[:,0]
y_values = arr[:,1]
ones = np.ones(arr.shape[0], dtype=np.uint64)
e = np.char.array(np.where(x_values < 0, ones, ones * 2))
n = np.char.array(np.where(y_values < 0, ones * 3, ones * 4))
x_values = np.char.array(np.absolute(x_values))
y_values = np.char.array(np.absolute(y_values))
x_values = np.char.replace(x_values, '.', '')
y_values = np.char.replace(y_values, '.', '')
new_arr = np.char.add(np.char.add(x_values, e), np.char.add(y_values, n))
return new_arr.astype(np.uint64)
Here, the x and y values of the input array are first split up. Then we use a vectorized computation to determine where e and n should be 1 or 2, 3 or 4. The last line uses a standard list comprehension to do the string merging bit, which is still undesirably slow for super large arrays but faster than a regular for loop. Also vectorizing the previous computations should speed the function up hugely.
Edit:
I was mistaken before. Numpy does have a nice way of handling string concatenation using the np.char.add() method. This requires converting x_values and y_values to Numpy character arrays using np.char.array(). Also for some reason, the np.char.add() method only takes two arrays as inputs, so it is necessary to first concatenate x_values and e and y_values and n and then concatenate these results. Still, this vectorizes the computations and should be pretty fast. The code is still a bit clunky because of the rather odd operation you are after, but I think this will help you speed up the function greatly.

You may use np.apply_along_axis. When you feed this function with another function that takes row (or column) as an argument, it does what you want to do.
For you case, You may rewrite the function as below:
def foo(row):
x, y = row
e, n = ''
if x < 0:
e = '1'
else:
w = '2'
if y > 0:
n = '3'
else:
s = '4'
return int(f'{abs(x)}{e}{abs(y){n}'.replace('.', ''))
# Where you want to you use it.
new_arr = np.apply_along_axis(foo, 1, n)

How can I implement this recursive formula?

Photo Formula
def prob(i, r, b): #r and b may be np.arrays or lists()
if i == 0:
return 1
else:
return (1 / i) * #And what's next?
Formula denotes unnormalized probability probability P(i) - the probability that the system is in state i, wherein and P(0) = 1, where I() - indicator function, b_k and r_k - const parameters, which fed to the input. I don't know how I can implement this code. I will appreciate any help!

def prob(i, rs, bs): # i: int, rs: [num], bs: [num]:
# I not know if the elems in rs are integers, neither positives
# I apply abs in rs elems and compare with equals and less
if i <= 0:
return 1
return (1 / i) * sum(b * prop(i - abs(r)) for (r, b) in zip(rs, bs))

Vectorized code for a function to generate vector values

Suppose we have a defined function as following, and we would like to iterate over n from 1 to L, I've suffered a lot for a vectorization code, since this code is rather slow due to for loop needed outside to call this function.
Details: L, K are large integers e.g. 1000 and H_n is float value.
def multifrac_Brownian_motion(n, L, K, list_hurst, ind_hurst):
t_ks = np.asarray(sorted(-np.array(range(1, K + 1))*(1./L)))
t_ns = np.linspace(0, 1, num=L+1)
t_n = t_ns[n]
chi_k = np.random.randn(K)
chi_lminus1 = np.random.randn(L)
H_n = get_hurst_value(t_n, list_hurst, ind_hurst)
part1 = 1./(np.random.gamma(0.5 + H_n))
sums1 = np.dot((t_n - t_ks)**(H_n - 0.5) - ((-t_ks)**(H_n - 0.5)), chi_k)
sums2 = np.dot((t_n - t_ns[:n])**(H_n - 0.5), chi_lminus1[:n])
return part1*(1./np.sqrt(L))*(sums1 + sums2)
for n in range(1, L + 1):
onelist.append(multifrac_Brownian_motion(n, L, K, list_hurst, ind_hurst=ind_hurst))
Update:
def list_hurst_funcs(M, seg_size=10):
"""Generate a list of Hurst function components
Args:
M: Int, number of hurst functions
seg_size: Int, number of segmentations of interval [0, 1]
Returns:
list_hurst: List, list of hurst function components
"""
list_hurst = []
for i in range(M):
seg_points = sorted(np.random.uniform(size=seg_size))
funclist = np.random.uniform(size=seg_size + 1)
list_hurst.append((seg_points, funclist))
return list_hurst
def get_hurst_value(x, list_hurst, ind):
if np.isscalar(x):
x = np.array(float(x), ndmin=1)
seg_points, funclist = list_hurst[ind]
condlist = [x < seg_points[0]] +\
[(x >= seg_points[s] and x < seg_points[s + 1])
for s in range(len(seg_points) - 1)] +\
[x >= seg_points[-1]]
return np.piecewise(x, condlist=condlist, funclist=funclist)

One way to tackle a problem like this is to (try) understand the big picture, and come with a different approach that treats everything as 2d or larger (LxK arrays). Another is to examine the multifrac_Brownian_motion, trying to speed it up, and where possible eliminate steps that depend on scalars or 1d arrays. In other words, work from the inside out. If we get enough of a speed improvement it may not matter that we have to call it in a loop. Even better the improvement suggests ways of operating in high dimensions.
As a start from inside out, I'd suggest replacing the t_ks calc with:
t_ks = -np.arange(K,0,-1)/L # 1./L if required by Py2 integer division
Since list_hurst, ind_hurst are the same for all n, I suspect you can move some time consuming parts of get_hurst_value outside the loop.
But I'd put most effort into improving that condlist construction. That's list comprehension buried deep inside your outer loop.
piecewise also loops over those seg_points.

Pulling data from a numpy array

I have a numpy ndarray that I made using numpy.loadtxt. I want to pull an entire row from it based on a condition in the third column. Something like : if array[2][i] is meeting my conditions, then get array[0][i] and array [1][i] as well. I'm new to python, and all of the numpy features, so I'm looking for the best way to do this. Ideally, I'd like to pull 2 rows at a time, but I wont always have an even number of rows, so I imagine that is a problem
import numpy as np
'''
Created on Jan 27, 2013
#author:
'''
class Volume:
f ='/Users/Documents/workspace/findMinMax/crapc.txt'
m = np.loadtxt(f, unpack=True, usecols=(1,2,3), ndmin = 2)
maxZ = max(m[2])
minZ = min(m[2])
print("Maximum Z value: " + str(maxZ))
print("Minimum Z value: " + str(minZ))
zIncrement = .5
steps = maxZ/zIncrement
currentStep = .5
b = []
for i in m[2]:#here is my problem
while currentStep < steps:
if m[2][i] < currentStep and m[2][i] > currentStep - zIncrement:
b.append(m[2][i])
if len(b) < 2:
currentStep + zIncrement
print(b)
Here is some code that I did in java that is the general idea of what I want:
while( e < a.length - 1){
for(int i = 0; i < a.length - 1; i++){
if(a[i][2] < stepSize && a[i][2] > stepSize - 2){
x.add(a[i][0]);
y.add(a[i][1]);
z.add(a[i][2]);
}
if(x.size() < 1){
stepSize += 1;
}
}
}

First of all, you probably don't want to put your code in that class definition...
import numpy as np
def main():
m = np.random.random((3, 4))
mask = (m[2] > 0.5) & (m[2] < 0.8) # put your conditions here
# instead of 0.5 and 0.8 you can use
# an array if you like
m[:, mask]
if __name__ == '__main__':
main()
mask is a boolean array, m[:, mask] is the array you want
m[2] is the third row of m. If you type m[2] + 2 you get a new array with the old values + 2. m[2] > 0.5 creates an array with boolean values. It is best to try this stuff out with ipython (www.ipython.org)
In the expression m[:, mask] the : means "take all rows", mask describes which columns should be included.
Update
Next try :-)
for i in range(0, len(m), 2):
two_rows = m[i:i+2]

If you can write your condition as a simple function
def condition(value):
# return True or False depending on value
then you could select your subarrays like this:
cond = condition(a[2])
subarray0 = a[0,cond]
subarray1 = a[1,cond]

matplotlib hist while ignoring a particular no data value

I've got a 2D numpy array with 1.0e6 as the no data value. I'd like to generate a histogram of the data and while I've succeeded this can't be the best way to do it.
from matplotlib import pyplot
import sys
eps = sys.float_info.epsilon
no_data = 1.0e6
e_data = elevation.reshape(elevation.size)
e_data_clean = [ ]
for i in xrange(len(e_data)):
val = e_data[i]
# floating point equality check for val aprox not equal no_data
if val > no_data + eps and val < no_data - eps:
e_data_clean.append(val)
pyplot.hist(e_data_clean, bins=100)
It seems like there should be a clean (and much faster one liner for this). Is there?

You can use a boolean array to select the required indices:
selected_values = (e_data > (no_data + eps)) & (e_data < (no_data - eps))
pyplot.hist(e_data[selected_values])
(e_data > (no_data + eps)) will create an array of np.bool with the same shape as e_data, set to True at a given index if and only if the value at that index is greater than (no_data + eps). & is the element-wise and operator to satisfy both conditions.
Alternatively, if no_data is just a convention, I would set those values to numpy.nan instead, and use e_data[numpy.isfinite(e_data)].

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loop over elements in a numpy array of arbitrary dimension - python

Something like this should do the job: newmu = np.where(np.abs(mu) > beta, np.sign(mu) * beta, mu) Or, if I get the logic right, newmu = np.minimum(np.abs(mu), beta) * np.sign(mu)

Related

How can this function be vectorized?

How can I implement this recursive formula?

Vectorized code for a function to generate vector values

Pulling data from a numpy array

matplotlib hist while ignoring a particular no data value

Categories

Resources