How to find the most close array in the 3-dimensions array

How to find the most close array in the 3-dimensions array - python

In my limited experience with python & numpy, I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this.
A=[3,-1, 4]
B = array([1,1,1],[1,-1,1],[1,1,-1])
The most close one in B is [1, -1, 1]
the weight of positive and negative > close of (A, B)
find the most close one in B (all the same Pos or Neg)
B1 = array([1,1,1], [1,-1,1], [1,1, -1], [3,1,4])
The result is [1,-1,1]
after searching around for a decent XX solution and found that everything out there was difficult to use.
Thanks in advance.

One possible way:
A = np.array([3,-1, 4])
B = np.array([[1,1,1],[1,-1,1],[1,1,-1]])
# distances array-wise
np.abs(B - A)
# sum of absolute values of distances (smallest is closest)
np.sum(np.abs(B - A), axis=1)
# index of smallest (in this case index 1)
np.argmin(np.sum(np.abs(B - A), axis=1))
# all in one line (take array 1 from B)
result = B[np.argmin(np.sum(np.abs(B - A), axis=1))]

Try this,
import numpy as np
A=np.array([3,-1, 4])
B =np.array([[1,1,1],[1,-1,1],[1,1,-1]])
x=np.inf
for val in B:
if (x>(np.absolute(A-val)).sum())and((np.sign(A)==np.sign(val)).all()==True):
x=(np.absolute(A-val)).sum()
y=val
print x
print y

Related

How to repeat a operation in for loop n times

The first input of following code is a single value like a=3, and second input is an array consisting of pairs of values like: B = [[1000, 1], [1000, 3], [999, 4]]. If the first value of each pair is even then I want the corresponding output value to be based on some specific calculation as shown in the code; if the first value is odd, there is another calculation as shown in code. The second value in the pair is the number of times I'd like to repeat the calculation for the pair. Calculations in the example should be repeated 1, 3 and 4 times for pairs 1, 2 and 3, respectively.
I am not sure how to repeat the calculation.
import numpy as np
a = np.array(input(), dtype=int)
B = []
for i in range(a):
b = np.array(input().split(), dtype=int)
B.append(b)
B = np.array(B)
C = []
for i in range(a):
if B[i, 0] % 2==0:
c = (B[i, 0] - 99) * 3
C.append(c)
else:
if B[i, 0] % 2 == 1:
d = (B[i, 0] - 15) * 2
C.append(d)

Ignoring the input, let's say you have an array
B = np.array([[1000, 1], [1000, 3], [999, 4]])
You can perform selective computations using a mask. The quickest way to type is probably using np.where:
C = np.where(B[:, 0] % 2, 2 * (B[:, 0] - 15), 3 * (B[:, 0] - 99))
This is a bit inefficient since it computes both values for each element. A much more efficient, but uglier, method would be to do everything in-place:
b = B[:, 0]
mask = (b % 2 == 0)
C = np.empty_like(b)
np.subtract(b, 99, out=C, where=mask)
np.subtract(b, 15, out=C, where=~mask)
np.multiply(C, 3, out=C, where=mask)
np.multiply(C, 2, out=C, where=~mask)
Numpy then provides the repeat function to do the repetition after you've computed the output values:
C = np.repeat(C, B[:, 1])
You can write the whole thing as a one-liner:
C = np.repeat(np.where(B[:, 0] % 2, 2 * (B[:, 0] - 15), 3 * (B[:, 0] - 99)), B[:, 1])

Probably a case for nested for loops. You have your outer loop, then loop through your calculations a certain number of times. So, from your example, try something like:
for i in range(a):
for x in range(B[i][1]):
if B[i][0] % 2 == 0:
B[i][0]=(B[i][0]-99)*3
else:
B[i][0]=(B[i][0]-15)*2
C.append(B[i][0]
Edited: If you want to repeat on the same value, just put it back into B and then append the value of that element of B back into C
Some other notes, not really related to your question but since the number will either be even or odd, you don't need the second if statement inside your else. If you really do need to explicitly check another condition, use Python's elif statement, short for else if which will only check the condition if none of the previous conditions were matched.

Fast algorithm to find exactly k columns in binary matrix such that the sum of those columns is the 1-vector

Suppose I have an (M x N) binary matrix where both M and N can be large. I want to find exactly k columns (k is relatively small, say less than 10) such that the sum of those k columns is the 1-vector (all elements are 1). One solution is adequate. Is there a fast algorithm for this?
For example, an algorithm working on the matrix
1 0 0
1 0 0
1 1 0
0 1 1
with k=2 should return columns 0 and 2, but should report no solutions if k=1 or k=3.
I've tried two approaches:
The slow combinatorial approach where I try all (N choose k) combinations and find the combination that sums to the 1-vector. This runs in O(N^k) time which is obviously horrendous.
A recursive approach, which is faster but still runs in O(N^k) worst-case time. The Python code is as below:
import numpy as np
def recursiveFn(mat, col_used_bool, col_sum_to_date, cols_to_go):
N = len(mat)
if cols_to_go == 1:
col_unused = 1 - col_sum_to_date
if list(col_unused) in [list(i) for i in mat]:
return (True, [col_unused])
else:
return (False, None)
for col_id in range(N):
if col_used_bool[col_id]:
continue
if 2 not in mat[col_id]+col_sum_to_date:
col_used_bool[col_id] = True
x = recursiveFn(mat, col_used_bool, mat[col_id]+col_sum_to_date, cols_to_go-1)
col_used_bool[col_id] = False
if x[0]:
return (True, x[1] + [mat[col_id]])
return (False, None)
exMat = [[1,1,1,0],[0,0,1,1],[0,0,0,1]] #input by colums
exMat = [np.asarray(i) for i in exMat]
k = 2
output = recursiveFn(mat = exMat, col_used_bool = [False for i in exMat],
col_sum_to_date = np.asarray([0 for i in exMat[0]]), cols_to_go = k)
print(output[1])
### prints this : [array([0, 0, 0, 1]), array([1, 1, 1, 0])]
I'm unsatisfied with either of these approaches, and I feel that a smarter and faster algorithm exists. Thanks very much for your help. This is my first post on StackOverflow, so please be gentle with me if I made a faux-pas somewhere!
(If interested, I've also asked the same question on Math Stack Exchange, but there I'm less concerned about algorithmic efficiency and more concerned about mathematical techniques.)

My first attempt would be integer-programming using one of the available high-performance solvers (e.g. Cbc).
Assuming some sparsity in your incidence-matrix, those will be very efficient and are quite general (side-constraints / adaptations). They are also complete and might be able to prove infeasibility.
A simple formulation would look like:
Instance
c0 c1 c2
1 0 0 r0
1 0 0 r1
1 1 0 r2
0 1 1 r3
IP:
minimize(0) # constant objective | pure feasibility problem
sum(c_i) = k # target of columns chosen
r0 = 1 = c0 # r0 just showing the origin of the constraint; no real variable!
r1 = 1 = c0
r2 = 1 = c0 + c1
r3 = 1 = c1 + c2
c_i in {0, 1} # all variables are binary
It might be possible to strenghten this formulation by additional inequalities like clique-inequalities (conflict-graph -> maximal-cliques), but not sure if that helps. Good solvers will do something similar dynamically be generating cuts.
A lot of theory is available. One keyword would be exact cover or all those packing/covering problems which are very similar.
Simple code-example:
import cvxpy as cp
import numpy as np
data = np.array([[1, 0, 0],
[1, 0, 0],
[1, 1, 0],
[0, 1, 1]])
def solve(k, data):
c = cp.Variable(data.shape[1], boolean=True)
con = [data * c == 1,
cp.sum(c) == k,
c >= 0,
c <= 1]
obj = cp.Minimize(0)
problem = cp.Problem(obj, con)
problem.solve(verbose=True, solver=cp.GLPK_MI)
if(problem.status == 'optimal'):
return np.where(np.isclose(c.value, 1.0) == True)[0]
else:
assert problem.status == 'infeasible'
return None
print(solve(2, data))
print(solve(1, data))
print(solve(3, data))
# [0 2]
# None
# None
Remarks:
The code uses cvxpy which is very powerful, but lacks some advanced integer-programming support
The only easy to use non-commercial solver is GLPK, which is very good, but usually cannot compete with Cbc
The very algebraic usage of cvxpy together with some interface-decisions lead to the unusual variable-bounds as constraints formulation here

As mentioned in the first answer, it is an Exact cover problem, which is NP-hard. A classical way to address NP-hard problem is backtracking.
When considering backtracking, generally, the devil lies in the details. Different implementations can provide quite different results.
Historically, Knuth proposed Algorithm X which is a recursive, nondeterministic, depth-first, backtracking algorithm.
This algorithm is worth being tested here.
However, due to the fact that only a small number k of columns are to be selected, I would try another approach, i.e. a classical backtracking algorithm with a boolean b[j] indicating if column j is selected, with two additional tricks.
When adding column j to the current sum of columns, we can stop the process as soon as a "2" is encountered, we don't need to wait for the final sum to be calculated
Instead of adding the column elements one by one, we can group p elements (corresponding to p rows) of each column into one integer, to accelerate the process of summing columns. We need to select the base for that. A small base allow to avoid too large numbers (this is important to limit the size of the ``isValid[]` array, see hereafter).
A base 2 is not possible: for example adding (1 0) and (1 0) will give (0 1), which is still a valid number.
Therefore, I propose to use a base 3, which allows to detect the presence of an erroneous "2" during the summation. For example,
V(0 1 1 0) = 0*3**0 + 1*3**1 +1*3**2 + 0*3**3
In practice, for analyzing groups of "p" elements, we need a boolean table of size "3**p", isValid[], which will allow to detect immediately if a given obtained integer is valid. This table must be preprocessed during the initialization phase.
We know that we have obtained the 1-vector when all the integers are equal to a specific value (3**p - 1)/2, noting that the last group may have a different size p' < p.
Due to the large value of n, a last trick could be tested:
Look for valid solutions for a number of rows n1 < n, and then, for each candidate solution obtained, check if it is really a solution for all n rows.

Find indexes of subarray in numpy array

I have two numpy arrays, one larger, one smaller:
a = np.array([[0,1,0],[0,0,1],[0,1,1]])
b = np.array([[0],[1]])
Is there a function that I can use to find the indexes of the larger array where there is an in an instance of the smaller?
Ideal result:
instances[0] = [[2, 0], [2, 1]]
instances[1] = [[1, 1], [1,2]]
Many thanks!

As far as I know there is not fast numpy function that will do this, but you can loop through and check pretty quickly.
def find_instances(a,b):
instances = []
for i in range(a.shape[0] - b.shape[0] + 1):
for j in range(a.shape[1] - b.shape[1] + 1):
if np.all(a[i:i+b.shape[0], j:j+b.shape[1]] == b):
instances.append([i,j])
return instances
Here each instance is the spot in the top left corner of a that matches the top left corner of b. Not quite the output you requested but it's easy enough to get the rest of the indices if you really need them from there. Hope that helps!

Multiplying parenthesis to get a polynomial in python

I was asked to do the newton polynomial interpolation and I was able to write the main code.
https://en.wikipedia.org/wiki/Newton_polynomial
But there is still one small thing that I am not able to get around since a couple of days, after reading I found a way to do it using Sympy, but I am not allowed to use anything other than basic numpy.
Now my problem is that I trying to multiply something like this
p(x)=j(x-q)(x-w)(x-e)+k(x-w)(x-e)+l(x-e)+d
to get this p(x)=ax³+bx²+cx+d , so I amlooking for the polynomial coefficients a,b,c,d
for example:
p(x)=5-7(x+1)+9(x+1)(x)-7(x+1)(x)(x-1)=-7x³+9x²+9x-2
of course I am looking for the general case, not only for ploynomials from third degree.
Any tip would be much appreciated, I am really stuck at this since a couple of days.
and Sorry for the sloppy writing of notation, but it seems stackoverflow doesn't accept latex and I am not able to post a picture because I don't have rhe required reputation. (if there is other solutuions to post it properly please tell me and I'll just post it again)
Thanks in advance :)

First, I'll rewrite the equation as
c3(x-r3)(x-r2)(x-r1)+c2(x-r2)(x-r1)+c1(x-r1)+c0
Next, note that this is equivalent to:
((c3(x-r3)+c2)(x-r2)+c1)(x-r1)+c0
You can multiply it out if you want to check.
So in general, you can do:
poly = np.poly1d([c[n]])
for i in range(n,0,-1):
poly = poly*np.poly1d([1,-r[n]])+np.poly1d([n-1])
You can probably replace np.poly1d([c[n]]) with just c[n] and np.poly1d([c[n-1]]) with just c[n-1], if you're willing to trust the coercion to work properly

One way of doing it is to represent a polinomial as an array where a[0]..a[n] where a[i] is the constant that you multipliy (x^i). The function will be something like p(x) = a[0] + a[1]*x + a[2]* (x**2)....
Now to add two polinomials in this representation you just need to pad the shorter one with 0s and add the values at matching indices.
If you want to multiply a polinomial by k*(x**z) you need to multiply every value by k and insert z zeros in front( a[0:0] = [0.] * z).
Using these two operations you can resolve the equation and get the coefficients you want.

Multiplying two polynomials x(x-1) is the same as convolving their coefficients:
# x => [1, 0]
# (x-1) => [1, -1]
numpy.convolve([1, 0], [1, -1]) # [1, -1, 0] => x^2 - x + 0
This means you can solve the problem using
import numpy
def mult(a, b):
"""
Polynomial multiplication is simply a convolution
"""
return numpy.convolve(a, b)
def add(a, b):
"""
Addition is a bit complex as a and b may have different lengths.
Simply prepend zeros to the shorter one
"""
if len(a) < len(b):
a = numpy.insert(a, 0, [0] * (len(b) - len(a)))
if len(b) < len(a):
b = numpy.insert(b, 0, [0] * (len(a) - len(b)))
return a + b
# p(x)=5-7(x+1)+9(x+1)(x)-7(x+1)(x)(x-1)=-7x³+9x²+9x-2
add(
add(
numpy.array([5]),
mult([-7], [1, 1]),
),
add(
mult([9], mult([1, 1], [1, 0])),
mult([-7], mult([1, 1], mult([1, 0], [1, -1])))
)
)
yields
array([-7, 9, 9, -2]) # => -7x^3 + 9x^2 + 9x - 2

Using numpy, we have access to the poly1d object. With that, j(x-q)(x-w)(x-e)+k(x-w)(x-e)+l(x-e)+d is equivalent to:
In [ ]: j, q, w, e, k, w, l, d = range(1, 9)
...: poly1 = j*np.poly1d([-q, -w, -e], r=1)
...: poly2 = k*np.poly1d([-w, -e], r=0)
...: poly3 = l*np.poly1d([-e])
...: poly = poly1 + poly2 + poly3 + d
...: print(poly)
3 2
1 x + 12 x + 14 x + 8

Numpy broadcasting with comparison operator; cyclic iteration

I have implemented a cyclic iteration function in two ways:
def Spin1(n, N) : # n - current state, N - highest state
value = n + 1
case1 = (value > N)
case2 = (value <= N)
return case1 * 0 + case2 * value
def Spin2(n, N) :
value = n + 1
if value > N :
return 0
else : return value
These functions are identical regarding the returned results. However the second function is not broadcasting-capable for a numpy array. So to test the first function I run this:
import numpy
AR1 = numpy.zeros((3, 4), dtype = numpy.uint32)
AR1[1,2] = 5
print AR1
print Spin1(AR1,5)
Magically it works, and that is so sweet. So I see exactly what I want:
[[0 0 0 0]
[0 0 5 0]
[0 0 0 0]]
[[1 1 1 1]
[1 1 0 1]
[1 1 1 1]]
Now with the second function print Spin2(AR1,5) it fails with this error:
if value > N
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
And it's clear why, since if Array statement is nonsence. So for now I just used the first variant. But when I look at those functions I have a strong feeling that in the first function there are much more mathematical operations so I don't lose the hope that I can do something about optimising it.
Questions:
1. Is it possible to optimise the function Spin1 to do less operations or how do I use the function Spin2 in broadcasting mode (possibly without making my code too ugly)? Extra question: What would be the fastest way to do that manipulation with an array?
2. Is there some standard Python function which does the same calculation (not implicitly broadcasting-capable) and how it is correctly called - "cyclic increment" probably?

There is a numpy function for this: np.where:
In [590]: AR1
Out[590]:
array([[0, 0, 0, 0],
[0, 0, 5, 0],
[0, 0, 0, 0]], dtype=uint32)
In [591]: np.where(AR1 >= 5, 0, 1)
Out[591]:
array([[1, 1, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 1]])
So, you could define:
def Spin1(n, N) :
value = n + 1
return np.where(value > N, 0, value)
NumPy also provides a way to turn normal Python functions into ufuncs:
def Spin2(n, N) :
value = n + 1
if value > N :
return 0
else : return value
Spin2 = np.vectorize(Spin2)
So that you can now call Spin2 on arrays:
In [595]: Spin2(AR1, 5)
Out[595]:
array([[1, 1, 1, 1],
[1, 1, 0, 1],
[1, 1, 1, 1]])
However, np.vectorize mainly provides syntactic sugar. There is still a Python function call being made for each array element, which makes np.vectorized ufuncs no faster than equivalent code using Python for-loops.

Your Spin1 follows a well established pattern in array oriented languages (e.g. APL, MATLAB) for 'vectorizing' a function like Spin2. You create one or more booleans (or 0/1 arrays) to represent the various states the array elements can take, and then construct the output by multiplication and summation.
For example, to avoid divide-by-zero problems, I have used:
1/(x+(x==0))
A variation on this is to use a boolean index array to select array elements that should be changed. In this case, you want to return value, but with selected elements 'rolled over'.
def Spin3(n, N) : # n - current state, N - highest state
value = n + 1
value[value>N] = 0
return value
In this case, the indexing approach is simpler, and seems to fit the program logic better. It may be faster, but I can't guarantee that. It's good to keep both approaches in mind.

I put here some feedback as an answer, just not to mess up with the question. So I've done timing tests on various functions and it turns out that assigning by a boolean mask in this case is the fastest variant (hpaulj's answer). np.where was 1.4 times slower and np.vectorize(Spin2) was 15 times slower. Now just out of curiousity I wanted to test this with loops, so I made up this algorithm for testing:
AR1 = numpy.zeros((rows, cols), dtype = numpy.uint32)
while d <= 100:
Buf = numpy.zeros_like(AR1)
r = 0
c = 0
while (r < rows) :
while (c < cols) :
temp = AR1[r, c] + 1
if temp > 5 :
Buf[r, c] = 1
else : Buf[r, c] = temp
c += 1
r += 1
c = 0
AR1 = Buf
d += 1
I am not sure, but it seems to be very straightforward implementation of all the above mentioned functions. But it is sooo slow, almost 300 times slower. I have read similar questions on SO, but still I don't get it, WHY is it so? And what exactly is causing this slowdown. Here I have intentionally made up a buffer to avoid read-write functions on the same elements and do not do memory clean up. So what can be more simple, I am confused. Don't want to open a new question, since it was asked few times already, so probably someone will put comments or has good links clarifying this?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.