Efficiently programming array elements to add up to a sum in python - python

I'm looking to implement in python a simple algorithm which takes as input an array and a sum, and finds a number X where if all elements in the array > X are converted to X, all the elements in the array should add up to the sum.
How do I do this efficiently?
Here is my code:
result = []
for _ in range(int(raw_input())):
input_array = map(int,raw_input().split())
sum_target = raw_input()
for e in input_array:
test_array = input_array
test_array[test_array > e] = e // supposed to replace all elements > e with e, but what's wrong here?
if sum(test_array) == sum_target:
result.append(e)
print result

Using the Numpy library (import numpy), you could replace the line
input_array = map(int,raw_input().split())
with
input_array = numpy.array(raw_input().split()).astype(int)
Then
test_array[test_array > e] = e
just works. Then, you could also do test_array.sum().
(That is, if you want to alter the array in-place, else you could replace
test_array = input_array
with
test_array = np.array(input_array)

Related

How to iterate over tuples using jax.lax.scan

I am looking to translate a bit of code from a NumPy version listed here, to a JAX compatible version. The NumPy code iteratively calculates the value of a matrix, E from the values of other matrices, A, B, D, as well as the value of E from the previous iteration: E_jm1.
Both the NumPy and JAX version work in their listed forms and produce identical results. How can I get the JAX version to work when passing A, B, D as a tuple instead of as a concatenated array? I have a specific use case where a tuple would be more useful.
I found a question asking something similar, but it just confirmed that this should be possible. There are no examples in the documentation or elsewhere that I could find.
Original NumPy version
import numpy as np
import jax
import jax.numpy as jnp
def BAND_J(A, B, D, E_jm1):
'''
output: E(N x N)
input: A(N x N), B(N x N), D(N x N), E_jm1(N x N)
๐„โฑผ = -[๐ + ๐€๐„โฑผโ‚‹โ‚]โปยน ๐ƒ
'''
B_inv = np.linalg.inv(B + np.dot( A, E_jm1 ))
E = -np.dot(B_inv, D)
return E
key = jax.random.PRNGKey(0)
N = 2
NJ = 4
# initialize matrices with random values
A, B, D = [ jax.random.normal(key, shape=(N,N,NJ)),
jax.random.normal(key, shape=(N,N,NJ)),
jax.random.normal(key, shape=(N,N,NJ)) ]
A_np, B_np, D_np = [np.asarray(A), np.asarray(B), np.asarray(D)]
# initialize E_0
E_0 = jax.random.normal(key+2, shape=(N,N))
E_np = np.empty((N,N,NJ))
E_np[:,:,0] = np.asarray(E_0)
# iteratively calculate E from A, B, D, and ๐„โฑผโ‚‹โ‚
for j in range(1,NJ):
E_jm1 = E_np[:,:,j-1]
E_np[:,:,j] = BAND_J(A_np[:,:,j], B_np[:,:,j], D_np[:,:,j], E_jm1)
JAX scan version
def BAND_J(E, ABD):
'''
output: E(N x N)
input: A(N x N), B(N x N), D(N x N), E_jm1(N x N)
'''
A, B, D = ABD
B_inv = jnp.linalg.inv(B + jnp.dot( A, E ))
E = -jnp.dot(B_inv, D)
return E, E # ("carryover", "accumulated")
abd = jnp.asarray([(A[:,:,j], B[:,:,j], D[:,:,j]) for j in range(NJ)])
# abd = tuple([(A[:,:,j], B[:,:,j], D[:,:,j]) for j in range(NJ)]) # this produces error
# ValueError: too many values to unpack (expected 3)
_, E = lax.scan(BAND_J, E_0, abd)
for j in range(1, NJ):
print(np.isclose(E[j-1], E_np[:,:,j]))
The short answer is "you can't". By design, jax.scan can scan over axes of arrays, not entries of arbitrary Python collections.
So if you want to use scan, you'll have to stack your entires into an array.
That said, since your tuple only has three elements, a good alternative would be to skip the scan and simply JIT-compile the for loop approach. JAX tracing will effectively unroll the loop and optimize the flattened sequence of operations. While this can lead to long compile times for large loops, since your application is only 3 iterations it shouldn't be problematic.

How can this function be vectorized?

I have a NumPy array with the following properties:
shape: (9986080, 2)
dtype: np.float32
I have a method that loops over the range of the array, performs an operation and then inputs result to new array:
def foo(arr):
new_arr = np.empty(arr.size, dtype=np.uint64)
for i in range(arr.size):
x, y = arr[i]
e, n = ''
if x < 0:
e = '1'
else:
w = '2'
if y > 0:
n = '3'
else:
s = '4'
new_arr[i] = int(f'{abs(x)}{e}{abs(y){n}'.replace('.', ''))
I agree with Iguananaut's comment that this data structure seems a bit odd. My biggest problem with it is that it is really tricky to try and vectorize the putting together of integers in a string and then re-converting that to an integer. Still, this will certainly help speed up the function:
def foo(arr):
x_values = arr[:,0]
y_values = arr[:,1]
ones = np.ones(arr.shape[0], dtype=np.uint64)
e = np.char.array(np.where(x_values < 0, ones, ones * 2))
n = np.char.array(np.where(y_values < 0, ones * 3, ones * 4))
x_values = np.char.array(np.absolute(x_values))
y_values = np.char.array(np.absolute(y_values))
x_values = np.char.replace(x_values, '.', '')
y_values = np.char.replace(y_values, '.', '')
new_arr = np.char.add(np.char.add(x_values, e), np.char.add(y_values, n))
return new_arr.astype(np.uint64)
Here, the x and y values of the input array are first split up. Then we use a vectorized computation to determine where e and n should be 1 or 2, 3 or 4. The last line uses a standard list comprehension to do the string merging bit, which is still undesirably slow for super large arrays but faster than a regular for loop. Also vectorizing the previous computations should speed the function up hugely.
Edit:
I was mistaken before. Numpy does have a nice way of handling string concatenation using the np.char.add() method. This requires converting x_values and y_values to Numpy character arrays using np.char.array(). Also for some reason, the np.char.add() method only takes two arrays as inputs, so it is necessary to first concatenate x_values and e and y_values and n and then concatenate these results. Still, this vectorizes the computations and should be pretty fast. The code is still a bit clunky because of the rather odd operation you are after, but I think this will help you speed up the function greatly.
You may use np.apply_along_axis. When you feed this function with another function that takes row (or column) as an argument, it does what you want to do.
For you case, You may rewrite the function as below:
def foo(row):
x, y = row
e, n = ''
if x < 0:
e = '1'
else:
w = '2'
if y > 0:
n = '3'
else:
s = '4'
return int(f'{abs(x)}{e}{abs(y){n}'.replace('.', ''))
# Where you want to you use it.
new_arr = np.apply_along_axis(foo, 1, n)

How to concatenate over 2 ndarray(variable) in loop (one of these is EMPTY in first loop )

I want to achieve this function in Python like Matlab
in matlab, the code is
A = [];
for ii = 0:9
B = [ii, ii+1, ii**2];
C = [ii+ii**2, ii-5];
A = [A, B, C];
end
but in Python, use np.hstack or np.concatenate, the ndarray must have same number of dimensions
if the A in first loop is empty, the code will make mistake as following:
for ii in range(10):
B = np.array([ii, ii+1, ii**2])
C = np.array([ii+ii**2, ii-5])
if ii == 0:
A = np.hstack([B, C])
else:
A = np.hstack([A, B, C])
and, that is my Python code, B and C are variable, not repeat the ndarray, plz don't close my question!
for ii in range(10):
B = np.array([ii, ii+1, ii**2])
C = np.array([ii+ii**2, ii-5])
if ii == 0:
A = np.hstack([B, C])
else:
A = np.hstack([A, B, C])
but, i think it a little troublesome and unreadable.
how can i rewrite it?(It's better to use only one line of code)
Without knowing what the result Should be - I think this is close
import numpy as np
q = np.arange(10)
bs = np.vstack((q,q+1,q**2)).T
cs = np.vstack((q,q**2,q-5)).T
a = np.hstack((bs,cs))
Or maybe:
a = np.hstack((bs,cs)).ravel()

How to vectorize a class instantiation to allow NumPy arrays as input?

I programmed class which looks something like this:
import numpy as np
class blank():
def __init__(self,a,b,c):
self.a=a
self.b=b
self.c=c
n=5
c=a/b*8
if (a>b):
y=c+a*b
else:
y=c-a*b
p = np.empty([1,1])
k = np.empty([1,1])
l = np.empty([1,1])
p[0]=b
k[0]=b*(c-1)
l[0]=p+k
for i in range(1, n, 1):
p=np.append(p,l[i-1])
k=np.append(k,(p[i]*(c+1)))
l=np.append(l,p[i]+k[i])
komp = np.zeros(shape=(n, 1))
for i in range(0, n):
pl_avg = (p[i] + l[i]) / 2
h=pl_avg*3
komp[i]=pl_avg*h/4
self.tot=komp+l
And when I call it like this:
from ex1 import blank
import numpy as np
res=blank(1,2,3)
print(res.tot)
everything works well.
BUT I want to call it like this:
res = blank(np.array([1,2,3]), np.array([3,4,5]), 3)
Is there an easy way to call it for each i element of this two arrays without editing class code?
You won't be able to instantiate a class with NumPy arrays as inputs without changing the class code. #PabloAlvarez and #NagaKiran already provided alternative: iterate with zip over arrays and instantiate class for each pair of elements. While this is pretty simple solution, it defeats the purpose of using NumPy with its efficient vectorized operations.
Here is how I suggest you to rewrite the code:
from typing import Union
import numpy as np
def total(a: Union[float, np.ndarray],
b: Union[float, np.ndarray],
n: int = 5) -> np.array:
"""Calculates what your self.tot was"""
bc = 8 * a
c = bc / b
vectorized_geometric_progression = np.vectorize(geometric_progression,
otypes=[np.ndarray])
l = np.stack(vectorized_geometric_progression(bc, c, n))
l = np.atleast_2d(l)
p = np.insert(l[:, :-1], 0, b, axis=1)
l = np.squeeze(l)
p = np.squeeze(p)
pl_avg = (p + l) / 2
komp = np.array([0.75 * pl_avg ** 2]).T
return komp + l
def geometric_progression(bc, c, n):
"""Calculates array l"""
return bc * np.logspace(start=0,
stop=n - 1,
num=n,
base=c + 2)
And you can call it both for sole numbers and NumPy arrays like that:
>>> print(total(1, 2))
[[2.6750000e+01 6.6750000e+01 3.0675000e+02 1.7467500e+03 1.0386750e+04]
[5.9600000e+02 6.3600000e+02 8.7600000e+02 2.3160000e+03 1.0956000e+04]
[2.1176000e+04 2.1216000e+04 2.1456000e+04 2.2896000e+04 3.1536000e+04]
[7.6205600e+05 7.6209600e+05 7.6233600e+05 7.6377600e+05 7.7241600e+05]
[2.7433736e+07 2.7433776e+07 2.7434016e+07 2.7435456e+07 2.7444096e+07]]
>>> print(total(3, 4))
[[1.71000000e+02 3.39000000e+02 1.68300000e+03 1.24350000e+04 9.84510000e+04]
[8.77200000e+03 8.94000000e+03 1.02840000e+04 2.10360000e+04 1.07052000e+05]
[5.59896000e+05 5.60064000e+05 5.61408000e+05 5.72160000e+05 6.58176000e+05]
[3.58318320e+07 3.58320000e+07 3.58333440e+07 3.58440960e+07 3.59301120e+07]
[2.29323574e+09 2.29323590e+09 2.29323725e+09 2.29324800e+09 2.29333402e+09]]
>>> print(total(np.array([1, 3]), np.array([2, 4])))
[[[2.67500000e+01 6.67500000e+01 3.06750000e+02 1.74675000e+03 1.03867500e+04]
[1.71000000e+02 3.39000000e+02 1.68300000e+03 1.24350000e+04 9.84510000e+04]]
[[5.96000000e+02 6.36000000e+02 8.76000000e+02 2.31600000e+03 1.09560000e+04]
[8.77200000e+03 8.94000000e+03 1.02840000e+04 2.10360000e+04 1.07052000e+05]]
[[2.11760000e+04 2.12160000e+04 2.14560000e+04 2.28960000e+04 3.15360000e+04]
[5.59896000e+05 5.60064000e+05 5.61408000e+05 5.72160000e+05 6.58176000e+05]]
[[7.62056000e+05 7.62096000e+05 7.62336000e+05 7.63776000e+05 7.72416000e+05]
[3.58318320e+07 3.58320000e+07 3.58333440e+07 3.58440960e+07 3.59301120e+07]]
[[2.74337360e+07 2.74337760e+07 2.74340160e+07 2.74354560e+07 2.74440960e+07]
[2.29323574e+09 2.29323590e+09 2.29323725e+09 2.29324800e+09 2.29333402e+09]]]
You can see that results are in compliance.
Explanation:
First of all I'd like to note that your calculation of p, k, and l doesn't have to be in the loop. Moreover, calculating k is unnecessary. If you see carefully, how elements of p and l are calculated, they are just geometric progressions (except the 1st element of p):
p = [b, b*c, b*c*(c+2), b*c*(c+2)**2, b*c*(c+2)**3, b*c*(c+2)**4, ...]
l = [b*c, b*c*(c+2), b*c*(c+2)**2, b*c*(c+2)**3, b*c*(c+2)**4, b*c*(c+2)**5, ...]
So, instead of that loop, you can use np.logspace. Unfortunately, np.logspace doesn't support base parameter as an array, so we have no other choice but to use np.vectorize which is just a loop under the hood...
Calculating of komp though is easily vectorized. You can see it in my example. No need for loops there.
Also, as I already noted in a comment, your class doesn't have to be a class, so I took a liberty of changing it to a function.
Next, note that input parameter c is overwritten, so I got rid of it. Variable y is never used. (Also, you could calculate it just as y = c + a * b * np.sign(a - b))
And finally, I'd like to remark that creating NumPy arrays with np.append is very inefficient (as it was pointed out by #kabanus), so you should always try to create them at once - no loops, no appending.
P.S.: I used np.atleast_2d and np.squeeze in my code and it could be unclear why I did it. They are necessary to avoid if-else clauses where we would check dimensions of array l. You can print intermediate results to see what is really going on there. Nothing difficult.
if it is just calling class with two different list elements, loop can satisfies well
res = [blank(i,j,3) for i,j in zip(np.array([1,2,3]),np.array([3,4,5]))]
You can see list of values for res variable
The only way I can think of iterating lists of arrays is by using a function on the main program for iteration and then do the operations you need to do inside the loop.
This solution works for each element of both arrays (note to use zip function for making the iteration in both lists if they have a small size as listed in this answer here):
for n,x in zip(np.array([1,2,3]),np.array([3,4,5])):
res=blank(n,x,3)
print(res.tot)
Hope it is what you need!

Retrieving array elements with an array of frequencies in NumPy

I have an array of numbers, a. I have a second array, b, specifying how many times I want to retrieve the corresponding element in a. How can this be achieved? The ordering of the output is not important in this case.
import numpy as np
a = np.arange(5)
b = np.array([1,0,3,2,0])
# desired output = [0,2,2,2,3,3]
# i.e. [a[0], a[2], a[2], a[2], a[3], a[3] ]
Thats exactly what np.arange(5).repeat([1,0,3,2,0]) does.
A really inefficient way to do that is this one :
import numpy as np
a = np.arange(5)
b = np.array([1,0,3,2,0])
res = []
i = 0
for val in b:
for aa in range(val):
res.append(a[i])
i += 1
print res
here's one way to do it:
res = []
for i in xrange(len(b)):
for j in xrange(b[i]):
out.append(a[i])
res = np.array(res) # optional

Categories

Resources