Extracting contiguous rows in an array - python

The following code in python
df['tag'] = df['Value'] < 1.0
df['mask'] = np.where(df['tag'],1,0)
first = df.index[df['tag'] & ~ df['tag'].shift(1).fillna(False)]
last = df.index[df['tag'] & ~ df['tag'].shift(-1).fillna(False)]
pr = [(i, j) for i, j in zip(first, last) if j > i + 1]
returns an array, pr, that contains tuples of contiguous rows lesser than the Value of 1. I have tried to translate this Julia to a partial extent as follows:
df[:tag]=df[:Value] .< 1.0
df[:mask]=zeros(length(df[:tag]))
df[:mask][df[:tag].==true] .= 1
df[:mask][df[:tag].==false] .= 0
How can I replicate the values for first, last, pr in Julia?

I will give you two possible approaches to this problem. The first is faster, but requires a bit more code. The second is slower, but shorter.
function getblocks1(vs)
blocks = Tuple{Int, Int}[]
inblock, start = false, 0, 0
for (i, v) in enumerate(vs)
if inblock
if v >= 1.0
push!(blocks, (start, i-1))
inblock = false
end
else
if v < 1.0
start = i
inblock = true
end
end
end
inblock && push!(blocks, (start, length(vs)))
blocks
end
function getblocks2(vs)
t = [false; vs .< 1.0; false]
dt = diff(t)
f = findall(==(1), dt)
l = findall(==(-1), dt) .- 1
collect(zip(f, l))
end
The crucial thing to know, that in Julia getblocks1 will be fast because loops in Julia are fast and the function tries to minimize the number of allocations and does everything in one pass. The second implementation is more Python-like, but allocates more and does several passes through the whole vector.

Related

Maximum Score from Two Arrays | Which Test Case is this approach missing?

Problem Statement
Given two integer arrays A and B of size N and M respectively. You begin with a score of 0. You want to perform exactly K operations. On the iᵗʰ operation (1-indexed), you will:
Choose one integer x from either the start or the end of any one array, A or B. Remove it from that array
Add x to score.
Return the maximum score after performing K operations.
Example
Input: A = [3,1,2], B = [2,8,1,9] and K=5
Output: 24
Explanation: An optimal solution is as follows:
Choose from end of B, add 9 to score. Remove 9 from B
Choose from start of A, add 3 to score. Remove 3 from A
Choose from start of B, add 2 to score. Remove 2 from B
Choose from start of B, add 8 to score. Remove 8 from B
Choose from end of A, add 2 to score. Remove 2 from A
The total score is 9+3+2+8+2 = 24
Constraints
1 ≤ N ≤ 6000
1 ≤ M ≤ 6000
1 ≤ A[i] ≤ 109
1 ≤ B[i] ≤ 109
1 ≤ K ≤ N+M
My Approach
Since, greedy [choosing maximum end from both array] approach is failing here [because it will produce conflict when maximum end of both array is same], it suggests we have to look for all possible combinations. There will be overlapping sub-problems, hence DP!
Here is the python reprex code for the same.
A = [3,1,2]
N = len(A)
B = [2,8,1,9]
M = len(B)
K = 5
memo = {}
def solve(i,j, AL, BL):
if (i,j,AL,BL) in memo:
return memo[(i,j,AL,BL)]
AR = (N-1)-(i-AL)
BR = (M-1)-(j-BL)
if AL>AR or BL>BR or i+j==K:
return 0
op1 = A[AL] + solve(i+1,j,AL+1,BL)
op2 = B[BL] + solve(i,j+1,AL,BL+1)
op3 = A[AR] + solve(i+1,j,AL,BL)
op4 = B[BR] + solve(i,j+1,AL,BL)
memo[(i,j,AL,BL)] = max(op1,op2,op3,op4)
return memo[(i,j,AL,BL)]
print(solve(0,0,0,0))
In brief,
i indicates that we have performed i operations from A
j indicates that we have performed j operations from B
Total operation is thus i+j
AL indicates index on left of which which all integers of A are used. Similarly AR indicates index on right of which all integers of A used for operation.
BL indicates index on left of which which all integers of B are used. Similarly BR indicates index on right of which all integers of B used for operation.
We are trying out all possible combination, and choosing maximum from them in each step. Also memoizing our answer.
Doubt
The code worked fine for several test cases, but also failed for few. The message was Wrong Answer means there was no Time Limit Exceed, Memory Limit Exceed, Syntax Error or Run Time Error. This means there is some logical error only.
Can anyone help in identifying those Test Cases? And, also in understanding intuition/reason behind why this approach failed in some case?
Examples were posted code gives the wrong answer:
Example 1.
A = [1, 1, 1]
N = len(A)
B = [1, 1]
M = len(B)
K = 5
print(print(solve(0,0,0,0))) # Output: 4 (which is incorrect)
# Correct answer is 5
Example 2.
A = [1, 1]
B = [1]
N = len(A)
M = len(B)
K = 3
print(print(solve(0,0,0,0))) # Output: 2 (which is incorrect)
# Correct answer is 3
Alternative Code
def solve(A, B, k):
def solve_(a_left, a_right, b_left, b_right, remaining_ops, sum_):
'''
a_left - left pointer into A
a_right - right pointer in A
b_left - left pointer into B
b_right - right pointer into B
remaining_ops - remaining operations
sum_ - sum from previous operations
'''
if remaining_ops == 0:
return sum_ # out of operations
if a_left > a_right and b_left > b_right:
return sum_ # both left and right are empty
if (a_left, a_right, b_left, b_right) in cache:
return cache[(a_left, a_right, b_left, b_right)]
max_ = sum_ # init to current sum
if a_left <= a_right: # A not empty
max_ = max(max_,
solve_(a_left + 1, a_right, b_left, b_right, remaining_ops - 1, sum_ + A[a_left]), # Draw from left of A
solve_(a_left, a_right - 1, b_left, b_right, remaining_ops - 1, sum_ + A[a_right])) # Draw from right of A
if b_left <= b_right: # B not empty
max_ = max(max_,
solve_(a_left, a_right, b_left + 1, b_right, remaining_ops - 1, sum_ + B[b_left]), # Draw from left of B
solve_(a_left, a_right, b_left, b_right - 1, remaining_ops - 1, sum_ + B[b_right])) # Draw from right of B
cache[(a_left, a_right, b_left, b_right)] = max_ # update cache
return cache[(a_left, a_right, b_left, b_right)]
cache = {}
return solve_(0, len(A) - 1, 0, len(B) - 1, k, 0)
Tests
print(solve([3,1,2], [2,8,1,9], 5) # Output 24
print(solve([1, 1, 1], [1, 1, 1], 5) # Output 5
The approach is failing because the Recursive Functions stops computing further sub-problems when either "AL exceeds AR" or "BL exceeds BR".
We should stop computing and return 0 only when both of them are True. If either of "AL exceeds AR" or "BL exceeds BR" evaluates to False, means we can solve that sub-problem.
Moreover, one quick optimization here is that when N+M==K, in this case we can get maximum score by choosing all elements from both the arrays.
Here is the correct code!
A = [3,1,2]
B = [2,8,1,9]
K = 5
N, M = len(A), len(B)
memo = {}
def solve(i,j, AL, BL):
if (i,j,AL,BL) in memo:
return memo[(i,j,AL,BL)]
AR = (N-1)-(i-AL)
BR = (M-1)-(j-BL)
if i+j==K or (AL>AR and BL>BR):
return 0
ans = -float('inf')
if AL<=AR:
ans = max(A[AL]+solve(i+1,j,AL+1,BL),A[AR]+solve(i+1,j,AL,BL),ans)
if BL<=BR:
ans = max(B[BL]+solve(i,j+1,AL,BL+1),B[BR]+solve(i,j+1,AL,BL),ans)
memo[(i,j,AL,BL)] = ans
return memo[(i,j,AL,BL)]
if N+M==K:
print(sum(A)+sum(B))
else:
print(solve(0,0,0,0))
[This answer was published taking help from DarryIG's Answer. The reason for publishing answer is to write code similar to code in question body. DarryIG's answer used different prototype for function]

Recurrent sequence task

Given the sequence f0, f1, f2, ... given by the recurrence relations f0 = 0, f1 = 1, f2 = 2 and fk = f (k-1) + f (k-3)
Write a program that calculates the n elements of this sequence with the numbers k1, k2, ..., kn.
Input format
The first line of the input contains an integer n (1 <= n <= 1000)
The second line contains n non-negative integers ki (0 <= ki <= 16000), separated by spaces.
Output format
Output space-separated values ​​for fk1, fk2, ... fkn.
Memory Limit: 10MB
Time limit: 1 second
The problem is that the recursive function at large values ​​goes beyond the limit.
def f (a):
    if a <= 2:
        return a
    return f (a - 1) + f (a - 3)
n = int (input ())
nums = list (map (int, input (). split ()))
for i in range (len (nums)):
    if i <len (nums) - 1:
        print (f (nums [i]), end = '')
    else:
        print (f (nums [i]))
I also tried to solve through a cycle, but the task does not go through time (1 second):
fk1 = 0
fk2 = 0
fk3 = 0
n = int (input ())
nums = list (map (int, input (). split ()))
a = []
for i in range (len (nums)):
    itog = 0
    for j in range (1, nums [i] + 1):
        if j <= 2:
            itog = j
        else:
            if j == 3:
                itog = 0 + 2
                fk1 = itog
                fk2 = 2
                fk3 = 1
            else:
                itog = fk1 + fk3
                fk1, fk2, fk3 = itog, fk1, fk2
    if i <len (nums) - 1:
        print (itog, end = '')
    else:
        print (itog)
How else can you solve this problem so that it is optimal in time and memory?
Concerning the memory, the best solution probably is the iterative one. I think you are not far from the answer. The idea would be to first check for the simple cases f(k) = k (ie, k <= 2), for all other cases k > 2 you can simply compute fi using (fi-3, fi-2, fi-1) until i = k. What you need to do during this process is indeed to keep track of the last three values (similar to what you did in the line fk1, fk2, fk3 = itog, fk1, fk2).
On the other hand, there is one thing that you need to do here. If you just perform computations of fk1, fk2, ... fkn independently, then you are screwed (unless you use a super fast machine or a Cython implementation). On the other hand, there is no reason to perform n independent computations, you can just compute fx for x = max(k1, k2, ..., kn) and on the way you'll store every answer for fk1, fk2, ..., fkn (this will slow down the computation of fx by a little bit, but instead of doing this n times you'll do it only once). This way it can be solved under 1s even for n = 1000.
On my machine, independent calculations for f15000, f15001, ..., f16000 takes roughly 30s, the "all at once" solution takes roughly 0.035s.
Honestly, that's not such an easy exercise, it would be interesting to show your solution on a site like code review to get some feedback on your solution once you found one :).
First, you have to sort the numbers. Then calculate values of the sequence one by one:
while True:
a3 = a2 + a0
a0 = a3 + a1
a1 = a0 + a2
a2 = a1 + a3
Lastly, return values in beginning order. To do that you have to remember position of every number. From [45, 22, 14, 33] make [[45,0], [22,1], [14,2], [33,3]] and then sort, calculate values and change them with argument [[f45,0], [f22,1], [f14,2], [f33,3]], then sort by second value.

How can I implement this point in polygon code in Python?

So, for my Computer Graphics class I was tasked with doing a Polygon Filler, my software renderer is currently being coded in Python. Right now, I want to test this pointInPolygon code I found at: How can I determine whether a 2D Point is within a Polygon? so I can make my own method later on basing myself on that one.
The code is:
int pnpoly(int nvert, float *vertx, float *verty, float testx, float testy)
{
int i, j, c = 0;
for (i = 0, j = nvert-1; i < nvert; j = i++) {
if ( ((verty[i]>testy) != (verty[j]>testy)) &&
(testx < (vertx[j]-vertx[i]) * (testy-verty[i]) / (verty[j]-verty[i]) + vertx[i]) )
c = !c;
}
return c;
}
And my attempt to recreate it in Python is as following:
def pointInPolygon(self, nvert, vertx, verty, testx, testy):
c = 0
j = nvert-1
for i in range(nvert):
if(((verty[i]>testy) != (verty[j]>testy)) and (testx < (vertx[j]-vertx[i]) * (testy-verty[i]) / (verty[j]-verty[i] + vertx[i]))):
c = not c
j += 1
return c
But this obviously will return a index out of range in the second iteration because j = nvert and it will crash.
Thanks in advance.
You're reading the tricky C code incorrectly. The point of j = i++ is to both increment i by one and assign the old value to j. Similar python code would do j = i at the end of the loop:
j = nvert - 1
for i in range(nvert):
...
j = i
The idea is that for nvert == 3, the values would go
j | i
---+---
2 | 0
0 | 1
1 | 2
Another way to achieve this is that j equals (i - 1) % nvert,
for i in range(nvert):
j = (i - 1) % nvert
...
i.e. it is lagging one behind, and the indices form a ring (like the vertices do)
More pythonic code would use itertools and iterate over the coordinates themselves. You'd have a list of pairs (tuples) called vertices, and two iterators, one of which is one vertex ahead the other, and cycling back to the beginning because of itertools.cycle, something like:
# make one iterator that goes one ahead and wraps around at the end
next_ones = itertools.cycle(vertices)
next(next_ones)
for ((x1, y1), (x2, y2)) in zip(vertices, next_ones):
# unchecked...
if (((y1 > testy) != (y2 > testy))
and (testx < (x2 - x1) * (testy - y1) / (y2-y1 + x1))):
c = not c

Why my while loop failed (python)?

I'm a new learner of python programming. Recently I'm trying to write a "tool" program of "dynamic programming" algorithm. However, the last part of my programe -- a while loop, failed to loop. the code is like
import numpy as np
beta, rho, B, M = 0.5, 0.9, 10, 5
S = range(B + M + 1) # State space = 0,...,B + M
Z = range(B + 1) # Shock space = 0,...,B
def U(c):
"Utility function."
return c**beta
def phi(z):
"Probability mass function, uniform distribution."
return 1.0 / len(Z) if 0 <= z <= B else 0
def Gamma(x):
"The correspondence of feasible actions."
return range(min(x, M) + 1)
def T(v):
"""An implementation of the Bellman operator.
Parameters: v is a sequence representing a function on S.
Returns: Tv, a list."""
Tv = []
for x in S:
# Compute the value of the objective function for each
# a in Gamma(x), and store the result in vals (n*m matrix)
vals = []
for a in Gamma(x):
y = U(x - a) + rho * sum(v[a + z]*phi(z) for z in Z)
# the place v comes into play, v is array for each state
vals.append(y)
# Store the maximum reward for this x in the list Tv
Tv.append(max(vals))
return Tv
# create initial value
def v_init():
v = []
for i in S:
val = []
for j in Gamma(i):
# deterministic
y = U(i-j)
val.append(y)
v.append(max(val))
return v
# Create an instance of value function
v = v_init()
# parameters
max_iter = 10000
tol = 0.0001
num_iter = 0
diff = 1.0
N = len(S)
# value iteration
value = np.empty([max_iter,N])
while (diff>=tol and num_iter<max_iter ):
v = T(v)
value[num_iter] = v
diff = np.abs(value[-1] - value[-2]).max()
num_iter = num_iter + 1
As you can see, the while loop at the bottom is used to iterate over "value function" and find the right answer. However, the while fails to loop, and just return num_iter=1. As for I know, the while loop "repeats a sequence of statements until some condition becomes false", clearly, this condition will not be satisfied until the diff converge to near 0
The major part of code works just fine, as far as I use the following for loop
value = np.empty([num_iter,N])
for x in range(num_iter):
v = T(v)
value[x] = v
diff = np.abs(value[-1] - value[-2]).max()
print(diff)
You define value as np.empty(...). That means that it is composed completely of zeros. The difference, therefore, between the last element and the second-to-last element will be zero. 0 is not >= 0.0001, so that expression will be False. Therefore, your loop breaks.

Pulling data from a numpy array

I have a numpy ndarray that I made using numpy.loadtxt. I want to pull an entire row from it based on a condition in the third column. Something like : if array[2][i] is meeting my conditions, then get array[0][i] and array [1][i] as well. I'm new to python, and all of the numpy features, so I'm looking for the best way to do this. Ideally, I'd like to pull 2 rows at a time, but I wont always have an even number of rows, so I imagine that is a problem
import numpy as np
'''
Created on Jan 27, 2013
#author:
'''
class Volume:
f ='/Users/Documents/workspace/findMinMax/crapc.txt'
m = np.loadtxt(f, unpack=True, usecols=(1,2,3), ndmin = 2)
maxZ = max(m[2])
minZ = min(m[2])
print("Maximum Z value: " + str(maxZ))
print("Minimum Z value: " + str(minZ))
zIncrement = .5
steps = maxZ/zIncrement
currentStep = .5
b = []
for i in m[2]:#here is my problem
while currentStep < steps:
if m[2][i] < currentStep and m[2][i] > currentStep - zIncrement:
b.append(m[2][i])
if len(b) < 2:
currentStep + zIncrement
print(b)
Here is some code that I did in java that is the general idea of what I want:
while( e < a.length - 1){
for(int i = 0; i < a.length - 1; i++){
if(a[i][2] < stepSize && a[i][2] > stepSize - 2){
x.add(a[i][0]);
y.add(a[i][1]);
z.add(a[i][2]);
}
if(x.size() < 1){
stepSize += 1;
}
}
}
First of all, you probably don't want to put your code in that class definition...
import numpy as np
def main():
m = np.random.random((3, 4))
mask = (m[2] > 0.5) & (m[2] < 0.8) # put your conditions here
# instead of 0.5 and 0.8 you can use
# an array if you like
m[:, mask]
if __name__ == '__main__':
main()
mask is a boolean array, m[:, mask] is the array you want
m[2] is the third row of m. If you type m[2] + 2 you get a new array with the old values + 2. m[2] > 0.5 creates an array with boolean values. It is best to try this stuff out with ipython (www.ipython.org)
In the expression m[:, mask] the : means "take all rows", mask describes which columns should be included.
Update
Next try :-)
for i in range(0, len(m), 2):
two_rows = m[i:i+2]
If you can write your condition as a simple function
def condition(value):
# return True or False depending on value
then you could select your subarrays like this:
cond = condition(a[2])
subarray0 = a[0,cond]
subarray1 = a[1,cond]

Categories

Resources