Efficiently count IntervalVars between given start/end times

Efficiently count IntervalVars between given start/end times - python

Is there an efficient way to count the number of IntervalVars between a given start and end time?
I'm trying to implement an employee rostering script. We have a demand that we have already generated that tells us how many employees should be working during a given interval.
What I would like to end up with is an IntVar for each i in the 24 (hour) intervals, givin the total employees with a starttime <= i <= endtime.
Below is a simple example.
from ortools.sat.python import cp_model
def main():
# init model
model = cp_model.CpModel()
emps = range(0,3)
emp_intervalvars = []
for e in emps:
start = model.NewIntVar(0,24,'st_e%i' % e)
end = model.NewIntVar(0,24,'et_e%i' % e)
dur = model.NewIntVar(0,24,'dur_e%i' % e)
pres = model.NewBoolVar('pres_e%i' % e)
interval = model.NewOptionalIntervalVar(start, dur, end, pres, 'interval_e%s' % e)
# calc start
model.Add(start == (end - dur)).OnlyEnforceIf(pres)
# make sure to set start/end to 0 if not present
model.Add(dur == 0).OnlyEnforceIf(pres.Not())
model.Add(start == 0).OnlyEnforceIf(pres.Not())
model.Add(end == 0).OnlyEnforceIf(pres.Not())
# make sure to set start/duration to > 0 if present
model.Add(dur > 0).OnlyEnforceIf(pres)
model.Add(end > 0).OnlyEnforceIf(pres)
# all emps between 8am and 6pm
model.Add(start >= 8).OnlyEnforceIf(pres)
model.Add(end <= 18).OnlyEnforceIf(pres)
if e == 0:
# lets say emp0 works mornings
model.Add(end <= 14)
elif e == 2:
# and emp2 works evenings
model.Add(start >= 11)
emp_intervalvars.append({
"present":pres,
"start":start,
"end":end,
"duration":dur,
"interval":interval
})
# simple objective
durations = list(map(lambda v: v["duration"], emp_intervalvars))
model.Maximize(sum(durations))
solver = cp_model.CpSolver()
solver.parameters.num_search_workers=8
solver.parameters.max_time_in_seconds=30
solver.parameters.log_search_progress=True
status = solver.Solve(model)
print(solver.StatusName(status))
for i,field in enumerate(model._CpModel__model.variables):
if field.name == '':
continue
print("{} : {}".format(field.name,solver._CpSolver__solution.solution[i]))
return
if __name__ == '__main__':
main()

a few comments:
# calc start
model.Add(start == (end - dur)).OnlyEnforceIf(pres)
This is already enforced by the interval var (which is actually exactly this constraint).
model.Add(end > 0).OnlyEnforceIf(pres)
is most likely not useful. But you can keep it.
Now, to your question:
given start and end variables and a time i
overlap_i = model.NewBoolVar('overlap_%i' % i)
before_i = model.NewBoolVar('before_%i' % i)
after_i = model.NewBoolVar('after_%i' % i)
model.Add(start <= i).OnlyEnforceIf(overlap_i)
model.Add(end > i).OnlyEnforceIf(overlap_i) # Intervals are open ended on the right
model.Add(end <= i).OnlyEnforceIf(before_i)
model.Add(start > i).OnlyEnforceIf(after_i)
model.Add(overlap_i + before_i + after_i == 1)
should do the trick

Related

Check a condition every 3 minutes without functions and without interrupting the loop

I have this working code that checks a conditions every 3 minutes considering the local time, so every 0, 3, 6, 9.....It prints "checking condition".
import time
def get_next_time():
minute = time.localtime().tm_min
result = 3 - (minute % 3) + minute
if result == 60:
result = 0
return result
next_run = get_next_time()
while True:
now = time.localtime()
if next_run == now.tm_min:
print("checking condition")
#some condition
next_run = get_next_time()
time.sleep(1)
The problem is that I need the code without functions, so I need to find a way to write this code without using any funcion, and I cannot use break or interrput the loop
I tried:
while True:
minute = time.localtime().tm_min
result = 3 - (minute % 3) + minute
if result == 60:
result = 0
now = time.localtime()
if result == now.tm_min:
print("checking conditions")
time.sleep(1)
But it does not work: it does not do nothing.
Any ideas?

you can compact the function in one statement:
import time
next_run = (3 - (time.localtime().tm_min % 3) + time.localtime().tm_min)%60
while True:
now = time.localtime()
if next_run == now.tm_min:
print("checking condition")
#checking conditions...
next_run=(3 - (time.localtime().tm_min % 3) + time.localtime().tm_min)%60
time.sleep(1)

The first time, the get_next_time() will only be executed when next_run == now.tm_min. The second time, you execute it each loop
import time
minute = time.localtime().tm_min
result = 3 - (minute % 3) + minute
if result == 60:
result = 0
while True:
now = time.localtime()
if result == now.tm_min:
print("checking conditions")
minute = time.localtime().tm_min
result = 3 - (minute % 3) + minute
if result == 60:
result = 0
time.sleep(1)

Rounding to the next multiple of 3 minutes contradicts the specification "every 0...".
It is enough to do
import time
first= True
while True:
minute= time.localtime().tm_min
if first or minute == target:
print("checking condition")
first= False
target= (minute + 3) % 60
time.sleep(1)
Update:
I modified the code so that a single call to localtime is made on every iteration, to make fully sure that the minutes do not change between the calls.
More compact but less efficient:
import time
while True:
minute= time.localtime().tm_min
if 'target' not in locals() or minute == target:
print("checking condition")
target= (minute + 3) % 60
time.sleep(1)

How do I create a time series with 15min buckets in pyspark?

I'm trying to create a report that shows the total number of minutes worked by a group of employees in 15 minute increments.
The source table has the time in/out and total minutes worked, one record for each employee.
I've create a RDD row wise mapping function to loop through the number of hours in a day, then an inner loop for each 15 minute increment.
Each loop should add a column to the RDD row dictionary.
I've confirmed the resulting schema contains these new columns, but I'm missing lots of data in the final output.
I'm not sure if it's a problem with the row iteration or the stacking.
This is the starting schema -
Any ideas?
final schema -
Updated code -
def create_time_block_columns(row_dict):
inhour = row_dict['inhour']
outhour = row_dict['outhour']
inminute = row_dict['inminute']
outminute = row_dict['outminute']
# loop through hours of day
for i in range(24):
# loop through quarter hour blocks
for j in range(1,5):
lowerBound = (j-1)*15
upperBound = j*15
# create column names like 't_0_0', 't_0_15', t_0_30', 't_0_45', 't_1_0', etc...
timeBlockColumnName = F't_{i}_{lowerBound}'
# Add a new key in the dictionary with the new column name and value.
# initialized to 0
row_dict[timeBlockColumnName] = 0
# if the employee was currently clocked in
if (inhour <= i) & (outhour >= i):
# if the inhour is the current time block hour and the outhour is in a future time block
# this means they worked the rest of the hour
# start_during_end_after
if (i == inhour) & (outhour > i):
if (inminute >= lowerBound):
row_dict[timeBlockColumnName] = (upperBound - inminute)
else:
row_dict[timeBlockColumnName] = 15
# if the current row is completely within the current time block [hour and minutes]
# this means they worked all 15 minutes of each hour quarter
elif (i < inhour) & (i > outhour):
row_dict[timeBlockColumnName] = 15
# if the inhour is before the current timeblock hour, and outhour is the current hour
# this means they worked all minutes in the current block up-to the outminute
elif (i < inhour) & (i == outhour):
if (outminute < lowerBound):
row_dict[timeBlockColumnName] = outminute - lowerBound
else:
row_dict[timeBlockColumnName] = 15
# if the inhour and outhour are the current timeblock hour, and they are the same hour,
# we'll calculated the difference between minutes
elif (i == inhour) & (i == outhour):
if (inminute >= lowerBound) & (outminute <= upperBound):
row_dict[timeBlockColumnName] = outminute - inminute
elif (inminute < lowerBound) & (outminute >= upperBound):
row_dict[timeBlockColumnName] = 15
elif (inminute >= lowerBound) & (outminute >= upperBound):
row_dict[timeBlockColumnName] = upperBound - inminute
elif (inminute < lowerBound) & (outminute <= upperBound):
row_dict[timeBlockColumnName] = outminute - lowerBound
# else: we don't do anything because the employee wasnt clocked in
return row_dict
mappedDF = Map.apply(frame = dyF, f = create_time_block_columns).toDF()
# output some interesting logs for debugging
mappedDF.printSchema()
# Build expression to stack new columns as rows
stack_expression = F"stack({24*4}"
for i in range(24):
for j in range(1,5):
stack_expression += F", 't_{i}_{(j-1)*15}', t_{i}_{(j-1)*15}"
stack_expression += ') as (time_block, minutes_worked)'
timeBlockDF = mappedDF.select('pos_key', 'p_dob', 'dob', 'employee', 'rate', 'jobcode', 'pay', 'overpay', 'minutes', F.expr(stack_expression))
timeBlockDF = timeBlockDF.filter('minutes_worked > 0') \
.withColumn("dob",F.col("dob").cast(DateType()))
# create time block identifier column
time_pattern = r't_(\d+)_(\d+)'
timeBlockDF = timeBlockDF.withColumn('time_block_hour', F.regexp_extract('time_block', time_pattern, 1).cast(IntegerType())) \
.withColumn('time_block_min', F.regexp_extract('time_block', time_pattern, 2).cast(IntegerType())) \
.drop('time_block') \
.withColumn('time_block_time', F.concat_ws(':', F.format_string("%02d", F.col('time_block_hour')), F.format_string("%02d", F.col('time_block_min')))) \
.withColumn('time_block_temp', F.concat_ws(' ', F.col('dob'), F.col('time_block_time'))) \
.withColumn('time_block_datetime', F.to_timestamp(F.col('time_block_temp'), 'yyyy-MM-dd HH:mm')) \
.withColumn('time_block_pay', ((F.col('pay') + F.col('overpay')) / F.col('minutes')) * F.col('minutes_worked')) \
.drop('time_block_temp', 'pay', 'overpay', 'minutes')
# output some interesting logs for debugging
timeBlockDF.printSchema()

The problem was with the udf.
There were several cases not handled by the conditions, but the stack expression was working fine.
Here is a working example [without considering shifts that span midnight].
def create_time_block_columns(row_dict):
inhour = row_dict['inhour']
outhour = row_dict['outhour']
inminute = row_dict['inminute']
outminute = row_dict['outminute']
# loop through hours of day
for i in range(24):
# loop through quarter hour blocks
for j in range(1,5):
lowerBound = (j-1)*15
upperBound = j*15
# create column names like 't_0_0', 't_0_15', t_0_30', 't_0_45', 't_1_0', etc...
timeBlockColumnName = F't_{i}_{lowerBound}'
# Add a new key in the dictionary with the new column name and value.
# initialized to 0
row_dict[timeBlockColumnName] = 0
# if the employee was currently clocked in
if (inhour <= i) & (outhour >= i):
# if the inhour is the current time block hour and the outhour is in a future time block
# this means they worked the rest of the hour
# start_during_end_after
if (i == inhour) & (outhour > i):
if (inminute >= lowerBound):
row_dict[timeBlockColumnName] = (upperBound - inminute)
else:
row_dict[timeBlockColumnName] = 15
# if the current row is completely within the current time block [hour and minutes]
# this means they worked all 15 minutes of each hour quarter
elif (inhour < i) & (i < outhour):
row_dict[timeBlockColumnName] = 15
# if the inhour is before the current timeblock hour, and outhour is the current hour
# this means they worked all minutes in the current block up-to the outminute
elif (i < inhour) & (i == outhour):
if (outminute < lowerBound):
row_dict[timeBlockColumnName] = outminute - lowerBound
else:
row_dict[timeBlockColumnName] = 15
# if the inhour and outhour are the current timeblock hour, and they are the same hour,
# we'll calculated the difference between minutes
elif (i == inhour) & (i == outhour):
if (inminute >= lowerBound) & (outminute <= upperBound):
row_dict[timeBlockColumnName] = outminute - inminute
elif (inminute < lowerBound) & (outminute >= upperBound):
row_dict[timeBlockColumnName] = 15
elif (inminute >= lowerBound) & (outminute >= upperBound):
row_dict[timeBlockColumnName] = upperBound - inminute
elif (inminute < lowerBound) & (outminute <= upperBound):
row_dict[timeBlockColumnName] = outminute - lowerBound
# else: we don't do anything because the employee wasnt clocked in
return row_dict
mappedDF = Map.apply(frame = dyF, f = create_time_block_columns).toDF()
# output some interesting logs for debugging
mappedDF.printSchema()
# Build expression to stack new columns as rows
stack_expression = F"stack({24*4}"
for i in range(24):
for j in range(1,5):
stack_expression += F", 't_{i}_{(j-1)*15}', t_{i}_{(j-1)*15}"
stack_expression += ') as (time_block, minutes_worked)'
timeBlockDF = mappedDF.select('pos_key', 'p_dob', 'dob', 'employee', 'rate', 'jobcode', 'pay', 'overpay', 'minutes', F.expr(stack_expression))
timeBlockDF = timeBlockDF.filter('minutes_worked > 0') \
.withColumn("dob",F.col("dob").cast(DateType()))

Ultrasonic sensor as counter

I am trying to code ultrasonic sensors to count the number of cars in a parking lot. I am relatively new to Python, so I am asking here for help.
I have three parking slots, in which each of them has an ultrasonic sensor.
How do I make it so that the sensors and their counters work together? For example, when the parking slots are empty, the counter shows three parking slots available. When two parking slots are filled, the counter shows one availability, etc.
I have done the following code, and I am wondering how I could continue to achieve my objective?
# Sensor 1
def distance_1():
time.sleep(0.5)
GPIO.output(TRIG_1, True)
time.sleep(0.00001)
GPIO.output(TRIG_1, False)
print("Reading Sensor 1")
while GPIO.input(ECHO_1) == 0:
start = time.time()
while GPIO.input(ECHO_1) == 1:
end = time.time()
duration = end - start
sound = 34000 / 2
distance = duration * sound
round(distance, 0)
total = 3
count = total
if distance <= 10:
count -= 1
elif distance > 10:
count += 1
if count < 0:
count = 0
elif count > total:
count = total
print(count)
mylcd.lcd_display_string("{}".format(count), 2)
# Sensor 2
def distance_2():
time.sleep(0.5)
GPIO.output(TRIG_2, True)
time.sleep(0.00001)
GPIO.output(TRIG_2, False)
print("Reading Sensor 2")
while GPIO.input(ECHO_2) == 0:
start = time.time()
while GPIO.input(ECHO_2) == 1:
end = time.time()
duration = end - start
sound = 34000 / 2
distance = duration * sound
round(distance, 0)
total = 3
count = total
if distance <= 10:
count -= 1
elif distance > 10:
count += 1
if count < 0:
count = 0
elif count > total:
count = total
print(count)
mylcd.lcd_display_string("{}".format(count), 2)
# Sensor 3
def distance_3():
time.sleep(0.5)
GPIO.output(TRIG_3, True)
time.sleep(0.00001)
GPIO.output(TRIG_3, False)
print("Reading Sensor 3")
while GPIO.input(ECHO_3) == 0:
start = time.time()
while GPIO.input(ECHO_3) == 1:
end = time.time()
duration = end - start
sound = 34000 / 2
distance = duration * sound
round(distance, 0)
total = 3
count = total
if distance <= 10:
count -= 1
elif distance > 10:
count += 1
if count < 0:
count = 0
elif count > total:
count = total
print(count)
mylcd.lcd_display_string("{}".format(count), 2)
while True:
distance_1()
distance_2()
distance_3()
GPIO.cleanup()

The trouble with programming is there are so many ways to achieve the same result.
Looking at your code, I would suggest taking a step back and refactoring it to use Python classes instead. You have a lot of code repetition happening, and eventually, the code will break if you need to keep adding more sensors.
For example:
class Parking:
"This is a parking class"
def __init__(self, space):
self.space = space
def empty(self):
if self.space == 0:
print('Parking space is empty')
def full(self):
if self.space == 1:
print('Parking space is full')
def distance(self):
time.sleep(0.5)
GPIO.output(TRIG, True)
. . .
# Input:
sensor1 = Parking(1)
sensor2 = Parking(1)
sensor3 = Parking(0)
# Output:
sensor1.empty()
sensor2.empty()
sensor3.empty()
# Output:
sensor1.full()
sensor2.full()
sensor3.full()
You can then update a dictionary with the output to monitor the latest sensor information. Ideally, the dictionary would be written to a central file accessible by all the sensors or raspberry pis to read.
available_spaces = {"sensor1": 0, "sensor2": 1, "sensor3": 0}

I analyzed your code and I did some refactorings.
I suggest you to use constant values initialization (the config section you see in the code below). The values I used are random.
A function can be parametrized, so you can pass arguments to functions, and avoid writing the same piece of code changing only a few values in the same place.
You should set up your microcontroller when the script starts, to tell the board how you are using the pins (as input or output).
I didn't dig into on the snippet above the lcd_display_string and why are you doing those operations. I suppose that these are required to print on screen the distance.
## configuration
# trigger
TRIG_1 = 17
TRIG_2 = 27
TRIG_3 = 22
# echo
ECHO_1 = 10
ECHO_2 = 9
ECHO_3 = 11
# timings
INITIAL_DELAY = 0.5
TRIGGERING_DELAY = 0.00001
## support functions
# initializing GPIO
def set_up():
# set trigger GPIOs as output pins
GPIO.setup(TRIG_1, GPIO.OUT)
GPIO.setup(TRIG_2, GPIO.OUT)
GPIO.setup(TRIG_3, GPIO.OUT)
# set echo GPIOs as input pins
GPIO.setup(ECHO_1, GPIO.IN)
GPIO.setup(ECHO_2, GPIO.IN)
GPIO.setup(ECHO_3, GPIO.IN)
# I didn't dig into these values and why are you doing these operations. I suppose that these are required to print on screen the distance.
def print_distance_on_lcd(distance):
total = 3
count = total
if distance <= 10:
count -= 1
elif distance > 10:
count += 1
if count < 0:
count = 0
elif count > total:
count = total
print(count)
mylcd.lcd_display_string("{}".format(count), 2)
def trigger(trigger):
time.sleep(INITIAL_DELAY)
GPIO.output(trigger, True) # set output pin on HIGH state
time.sleep(TRIGGERING_DELAY)
GPIO.output(trigger, False) # set output pin on LOW state
def distance(t, echo):
trigger(t)
# initializing the variables here, allows you to use it outside the while block below
# using variable names that explains their content
start_time = time.time()
end_time = time.time()
# this block is not wrong, but unnecessary: initializing the variable like above is enough
'''
while GPIO.input(echo) == 0:
start_time = time.time()
'''
while GPIO.input(echo) == 1:
end_time = time.time()
duration = end_time - start_time
sound = 34000 / 2
distance = duration * sound
return distance
# call initialization function (this will be executed only one time)
set_up()
# loop forever
while True:
set_up()
print("Reading Sensor 1")
distance_sensor_1 = distance(TRIG_1, ECHO_1)
print_distance_on_lcd(distance_sensor_1)
print("Reading Sensor 2")
distance_sensor_2 = distance(TRIG_2, ECHO_2)
print_distance_on_lcd(distance_sensor_2)
print("Reading Sensor 3")
distance_sensor_3 = distance(TRIG_3, ECHO_3)
print_distance_on_lcd(distance_sensor_3)
GPIO.cleanup()

Difference between two similar if loops in Python

I have two codes which should perform the same thing but in the first, I am not getting the result but in the second one I am getting output
if (Method == "EMM" ):
if ((Loan_Obligation/12)+EMI) !=0:
DSCR_Post = EBITDA_EMM/((Loan_Obligation/12)+EMI)
else:
0
elif (Method != "EMM" ):
if ((Loan_Obligation/12)+EMI) !=0:
DSCR_Post = EBITDA/((Loan_Obligation/12)+EMI)
else:
0
and other one is:
if (Method == "EMM"):
DSCR_Post = EBITDA_EMM/((Loan_Obligation/12)+EMI) if ((Loan_Obligation/12)+EMI) !=0 else 0
else:
DSCR_Post = EBITDA/((Loan_Obligation/12)+EMI) if ((Loan_Obligation/12)+EMI) !=0 else 0
print('DSCR_Post:',DSCR_Post)
Can someone help me what is the difference between the two codes

In your first code snippet, you are not assigning the 0 to DSCR_Post as you do in the second. Modify as follows:
if Method == "EMM" :
if (Loan_Obligation / 12) + EMI !=0:
DSCR_Post = EBITDA_EMM / ((Loan_Obligation / 12) + EMI)
else:
DSCR_Post = 0 # the 0 has to be assigned!
else: # you do not need a condition here! It can either be equal or not, no third state possible.
if (Loan_Obligation / 12) + EMI !=0:
DSCR_Post = EBITDA / ((Loan_Obligation / 12) + EMI)
else:
DSCR_Post = 0
print('DSCR_Post:',DSCR_Post)
Which can be simplified to the following:
ebid = EBITDA_EMM if Method == "EMM" else EBITDA
DSCR_Post = 0 # 0 will be overwritten if ...
if (Loan_Obligation / 12) + EMI != 0:
DSCR_Post = ebid / ((Loan_Obligation / 12) + EMI)
print('DSCR_Post:',DSCR_Post)

Traceback in Needleman-Wunsch global alignment without storing pointer

My understanding is that while basically every discussion of dynamic programming I can find has one store the pointers as the matrix is populated, it is faster to instead to re-calculate the previous cells during the traceback step instead.
I have my dynamic programming algorithm to build the matrix correctly as far as I can tell, but I am confused on how to do the traceback calculations. I also have been told that it is necessary to recalculate the values (instead of just looking them up) but I don't see how that will come up with different numbers.
The version of SW I am implementing includes an option for gaps in both sequences to open up, so the recurrence relation for each matrix has three options. Below is the current version of my global alignment class. From my hand calculations I believe that score_align properly generates the matrix, but obviously traceback_col_seq does not work.
INF = 2147483647 #max size of int32
class global_aligner():
def __init__(self, subst, open=10, extend=2, double=3):
self.extend, self.open, self.double, self.subst = extend, open, double, subst
def __call__(self, row_seq, col_seq):
#add alphabet error checking?
score_align(row_seq, col_seq)
return traceback_col_seq()
def init_array(self):
self.M = zeros((self.maxI, self.maxJ), int)
self.Ic = zeros((self.maxI, self.maxJ), int)
self.Ir = zeros((self.maxI, self.maxJ), int)
for i in xrange(self.maxI):
self.M[i][0], self.Ir[i][0], self.Ic[i][0] = \
-INF, -INF, -(self.open+self.extend*i)
for j in xrange(self.maxJ):
self.M[0][j], self.Ic[0][j], self.Ir[0][j] = \
-INF, -INF, -(self.open+self.extend*j)
self.M[0][0] = 0
self.Ic[0][0] = -self.open
def score_cell(self, i, j, chars):
thisM = [self.Ic[i-1][j-1]+self.subst[chars], self.M[i-1][j-1]+\
self.subst[chars], self.Ir[i-1][j-1]+self.subst[chars]]
thisC = [self.Ic[i][j-1]-self.extend, self.M[i][j-1]-self.open, \
self.Ir[i][j-1]-self.double]
thisR = [self.M[i-1][j]-self.open, self.Ir[i-1][j]-self.extend, \
self.Ic[i-1][j]-self.double]
return max(thisM), max(thisC), max(thisR)
def score_align(self, row_seq, col_seq):
self.row_seq, self.col_seq = list(row_seq), list(col_seq)
self.maxI, self.maxJ = len(self.row_seq)+1, len(self.col_seq)+1
self.init_array()
for i in xrange(1, self.maxI):
row_char = self.row_seq[i-1]
for j in xrange(1, self.maxJ):
chars = row_char+self.col_seq[j-1]
self.M[i][j], self.Ic[i][j], self.Ir[i][j] = \
self.score_cell(i, j, chars)
def traceback_col_seq(self):
self.traceback = list()
i, j = self.maxI-1, self.maxJ-1
while i > 1 and j > 1:
cell = [self.M[i][j], self.Ic[i][j], self.Ir[i][j]]
cellMax = max(cell)
chars = self.row_seq[i-1]+self.col_seq[j-1]
if cell.index(cellMax) == 0: #M
diag = [diagM, diagC, diagR] = self.score_cell(i-1, j-1, chars)
diagMax = max(diag)
if diag.index(diagMax) == 0: #match
self.traceback.append(self.col_seq[j-1])
elif diag.index(diagMax) == 1: #insert column (open)
self.traceback.append('-')
elif diag.index(diagMax) == 2: #insert row (open other)
self.traceback.append(self.col_seq[j-1].lower())
i, j = i-1, j-1
elif cell.index(cellMax) == 1: #Ic
up = [upM, upC, upR] = self.score_cell(i-1, j, chars)
upMax = max(up)
if up.index(upMax) == 0: #match (close)
self.traceback.append(self.col_seq[j-1])
elif up.index(upMax) == 1: #insert column (extend)
self.traceback.append('-')
elif up.index(upMax) == 2: #insert row (double)
self.traceback.append('-')
i -= 1
elif cell.index(cellMax) == 2: #Ir
left = [leftM, leftC, leftR] = self.score_cell(i, j-1, chars)
leftMax = max(left)
if left.index(leftMax) == 0: #match (close)
self.traceback.append(self.col_seq[j-1])
elif left.index(leftMax) == 1: #insert column (double)
self.traceback.append('-')
elif left.index(leftMax) == 2: #insert row (extend other)
self.traceback.append(self.col_seq[j-1].lower())
j -= 1
for j in xrange(0,j,-1):
self.traceback.append(self.col_seq[j-1])
for i in xrange(0,i, -1):
self.traceback.append('-')
return ''.join(self.traceback[::-1])
test = global_aligner(blosumMatrix)
test.score_align('AA','AAA')
test.traceback_col_seq()

I think the main problem is that you aren't taking the matrix that you're currently in into account when generating the cells that you could potentially have come from. cell = [self.M[i][j], self.Ic[i][j], self.Ir[i][j]] is right for the first time through the while loop, but after that you can't just choose the matrix that has the highest score. Your options are constrained by where you're coming from. I'm having a bit of trouble following your code, but I think you're taking that into account in the if statements in the while loop. If that's the case, then I think changes along the lines of these should be sufficient:
cell = [self.M[i][j], self.Ic[i][j], self.Ir[i][j]]
cellIndex = cell.index(max(cell))
while i > 1 and j > 1:
chars = self.row_seq[i-1]+self.col_seq[j-1]
if cellIndex == 0: #M
diag = [diagM, diagC, diagR] = self.score_cell(i-1, j-1, chars)
diagMax = max(diag)
...
cellIndex = diagMax
i, j = i-1, j-1
elif cell.index(cellMax) == 1: #Ic
up = [upM, upC, upR] = self.score_cell(i-1, j, chars)
upMax = max(up)
...
cellIndex = upMax
i -= 1
elif cell.index(cellMax) == 2: #Ir
left = [leftM, leftC, leftR] = self.score_cell(i, j-1, chars)
leftMax = max(left)
...
cellIndex = leftMax
j -= 1
Like I said, I'm not positive that I'm following your code correctly, but see if that helps.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficiently count IntervalVars between given start/end times - python

Related

Check a condition every 3 minutes without functions and without interrupting the loop

How do I create a time series with 15min buckets in pyspark?

Ultrasonic sensor as counter

Difference between two similar if loops in Python

Traceback in Needleman-Wunsch global alignment without storing pointer

Categories

Resources