Python complete search in one pass function

Python complete search in one pass function - python

I am writing a program that takes in a list of start and end times for farmers milking cows and determines both the longest time where >=1 cow is being milked and the longest time where no cows are being milk.
In it, I've tried using this function. It's an exercise on complete search, but this isn't fast enough when there's a lot of data (I think because there are n^2 iterations).
timesIS is simply a list of the times in increasing start order, and timesDE is a list of the same times by decreasing end. timeIndex is the position from which to start. For the longest interval of milking, my program later does this for every index and returns the longest interval.
While still keeping to a complete search, how can I make this more efficient (switch to something closer to n passes, perhaps)?
def nextCease(TimesIS, timesDE, timeIndex):
latestTime = TimesIS[timeIndex][1]
for j in range (0, len(timesDE)):
for i in range (0, len(timesDE)):
if timesDE[i][0]<=latestTime and timesDE[i][1]>=latestTime:
latestTime = timesDE[i][1]
if latestTime == timesDE[0][1]:
return latestTime
break
return latestTime
Here's a small piece of data input (first line is just the number of farmers):
6
100 200
200 400
400 800
800 1600
50 100
1700 3200
I think this is a minimal, complete, and verifiable example:
from operator import itemgetter
times = [[100,200], [200,400], [400,800], [800,1600], [50,100], [1700,3200]
def nextCease(TimesIS, timesDE, timeIndex):
latestTime = TimesIS[timeIndex][1]
for j in range (0, len(timesDE)):
for i in range (0, len(timesDE)):
if timesDE[i][0]<=latestTime and timesDE[i][1]>=latestTime:
latestTime = timesDE[i][1]
if latestTime == timesDE[0][1]:
return latestTime
break
return latestTime
timesIS = sorted(times[:], key=itemgetter(0)) #increasing starttimes
timesDE = sorted(times[:], key=itemgetter(1), reverse=True) #decreasing endtimes
longestIntervalMilk = 0
for i in range (0, len(times)):
interval = nextCease(timesIS, timesDE, i) - timesIS[i][0]
if interval > longestIntervalMilk:
longestIntervalMilk = interval
longestIntervalNoMilk = 0
latestFinish = 0
for i in range (0, len(times)):
latestFinish = nextCease(timesIS, timesDE, i)
timesIS2 = timesIS[:]
while(timesIS2[0][0] < latestFinish):
nextStartExists = True
del timesIS2[0]
if timesIS2 == []:
nextStartExists = False
break
if nextStartExists == True:
nextStart = timesIS2[0][0]
longestIntervalNoMilk = nextStart - latestFinish
print(str(longestIntervalMilk) + " " + str(longestIntervalNoMilk) + "\n"))
EDIT: In the meantime, I wrote up this. It gives the wrong output for a very long list (it's 1001 lines so I won't reprint it here, but you can find it at http://train.usaco.org/usacodatashow?a=iA4oZAAX7KZ) and I'm confused as to why:
times = sorted(times[:], key=itemgetter(0))
def longestMilkInterval(times):
earliestTime = times[0]
latestTime = times[0][1]
interval = 0
for i in range (1, len(times)):
if times[i][1] > latestTime and times[i][0] <= latestTime:
if times[i][1] - earliestTime[0] > interval:
interval = times[i][1] - earliestTime[0]
latestTime = times[i][1]
else:
earliestTime = times[i]
latestTime = times[i][1]
print(earliestTime)
return interval
def longestNoMilkInterval(times):
earliestTime = times[0][1]
interval = 0
for i in range (0, len(times)):
if times[i][0] >= earliestTime:
if times[i][0] - earliestTime > interval:
interval = times[i][0] - earliestTime
break
else:
earliestTime = times[i][1]
return interval
Output should be 912 184 (>=1 cow, 0 cow).

Here is a very straightforward approach which does it in one pass, including a sort, so the complexity would be O(n*logn).
# Part 1: transform to tuples (start_time, typ)
items = []
for start, end in times:
items += [(start, 's'), (end, 'e')]
items = sorted(items)
# Part 2: compute max durations where 0 or 1+ cows are being milked
max_0_cows = max_1plus_cows = 0
last_intersection_time = items[0][0] # starting with first cow milk time
nof_cows_milked = 1
for i, (start_time, typ) in enumerate(items[1:], 1):
if items[i-1][0] == start_time and items[i-1][1] != typ:
continue
if i+1 < len(items) and items[i+1][0] == start_time and items[i+1][1] != typ:
continue
if typ == 's':
nof_cows_milked += 1
elif typ == 'e':
nof_cows_milked -= 1
# check if we cross from 1+ -> 0 or 0 -> 1+
if (typ, nof_cows_milked) in (('e', 0), ('s', 1)):
duration = start_time - last_intersection_time
if nof_cows_milked == 1:
max_0_cows = max(max_0_cows, duration)
if nof_cows_milked == 0:
max_1plus_cows = max(max_1plus_cows, duration)
last_intersection_time = start_time
print("Max time 0 cows: {}, Max time 1+ cows: {}".format(max_0_cows, max_1plus_cows))
Building of items: It puts the start/end itervals into a list of tuples (start_time, typ) so we can traverse the list and if we see a s a new cow is being milked and e then a cow is stopped being milked. This way we can have a counter nof_cows_milked at any time which is the basis for getting the "max time 0 cows milked" and "max time 1+ cows milked"
The actual longest-time-finder checks for all the transitions from 0 -> 1+ cows milked or 1+ cows -> 0 cows milked. In the first 4 lines it filters out the cases when two adjacent itervals (one farmer stops when the other farmer starts) It keeps track of those times with last_intersection_time and compares the duration to the max duration of max_0_cows and max_1_plus_cows. Again, this part is not very pretty, maybe there are more elegant ways to solve that.
[my algorithm] gives the wrong output [...] and I'm confused as to why
Your algorithm basically just checks for the longest interval of a single tuple, but doesn't check for overlapping or adjacent tuples.
Take these intervals as an visual example:
Your code just finds the interval G-H, whereas you need to find C-F. You somewhere need to keep track of how many cows are milked in parallel, so you need at least the nof_of_cows_milked counter as in my code example.

Related

Python if statement with multiple condition is messing up?

I'm a beginner with Python. I have a 2-d array called infected that stores values that correspond with the index. This bit of code is messy, but basically what I'm trying to do is simulate an infectious disease spreading over a number of days (T). The individual is infectious for infTime many days and then goes into recovery where they are immune for immTime days. There's also a probability value for whether a node will be infected and a value for how many nodes they will be connected to.
My problem is that I'm also trying to track the number of individuals currently susceptible, infected, or immune, but something is going wrong in the elif statement that is marked "# Messing up in this loop". Currently, the program is running through the statement more times than it should, which is throwing off the variables. If I switch the conditions in the elif statement, the program doesn't go through it and will stay at a very low number of infected individuals the entire time. I'm really stuck and I can't find any reason why it's not working how I want it to.
Code:
# Loop through T days, checking for infected individuals and connecting them to beta num of nodes, possibly infecting
infTime = 5 # Time spent infected before becoming immune
immTime = 20 # Time spent immune before becoming susceptible again
numSus = N - count
day = 0
while day < T:
for a in range(len(infected)):
nextnode = random.randint(0, N-1)
if((infected[a][0] == 1) and (infected[a][3] < infTime)):
num = infected[a][1]
for b in range(num-1):
if((a != nextnode) and (infected[nextnode][0] == 0)):
infected[a][3] += 1
chance = round((random.uniform(0, 1)), 2)
if(infected[nextnode][2] > chance):
infected[nextnode][0] = 1
G.add_edge(a, nextnode)
count += 1
numInf += 1
numSus -= 1
elif((a != nextnode) and (infected[nextnode][0] == 1)):
G.add_edge(a, nextnode)
elif((infected[a][0] == 1) and (infected[a][3] == infTime)): # Messing up in this loop
infected[a][3] = 0
infected[a][4] = 1
numImm += 1
numInf -= 1
G.add_edge(a, nextnode)
elif((infected[a][0] == 0) and (1 < infected[a][4] < immTime)):
infected[a][4] += 1
elif((infected[a][0] == 0) and (infected[a][4] == immTime)):
infected[a][4] = 0
numImm -= 1
numSus =+ 1
day += 1
print("Number of infected on day ", day, ": ", count)

How do I get the shortest repitition of something in an array?

Let's say you have a list which only has two types of values and goes something like ['t','r','r','r','t','t','r','r','t'] and you want to find the length of the smallest sequence number of 'r's which have 't's at both ends.
In this case the smallest sequence of 'r' has a length of 2, because there is first t,r,r,r,t and then t,r,r,t, and the latter has the smallest number of 'r's in a row surrounded by 't' and the number of 'r's is 2.
How would I code for finding that number?
This is from a problem of trying of going to a play with your friend, and you want to sit as close as possible with your friend, so you are trying to find the smallest amount of taken seats in between two free seats at a play. "#" is a taken seat and a "." is a free seat. you are given the amount of seats, and the seating arrangement (free seats and taken seats), and they are all in one line.
An example of an input is:
5
#.##.
where there are two taken seats(##) in between two free seats.
Here is my code which is not working for inputs that I don't know, but working for inputs I throw at it.
import sys
seats = int(input())
configuration = input()
seatsArray = []
betweenSeats = 1
betweenSeatsMin = 1
checked = 0
theArray = []
dotCount = 0
for i in configuration:
seatsArray.append(i)
for i in range(len(seatsArray)):
if i == len(seatsArray) - 1:
break
if seatsArray[i] == "." and seatsArray[i+1] == ".":
print(0)
sys.exit()
for i in range(0,len(seatsArray)):
if i > 0:
if checked == seats:
break
checked += 1
if seatsArray[i] == "#":
if i > 0:
if seatsArray[i-1] == "#":
betweenSeats += 1
if seatsArray[i] == ".":
dotCount += 1
if dotCount > 1:
theArray.append(betweenSeats)
betweenSeats = 1
theArray = sorted(theArray)
if theArray.count(1) > 0:
theArray.remove(1)
theArray = list(dict.fromkeys(theArray))
print(theArray[0])

This is a noob and a !optimal approach to your problem using a counter for the minimum and maximum sequence where ew compare both and return the minimum.
''' create a funciton that
will find min sequence
of target char
in a list'''
def finder(a, target):
max_counter = 0
min_counter = 0
''' iterate through our list
and if the element is the target
increase our max counter by 1
'''
for i in x:
if i == target:
max_counter += 1
'''min here is 0
so it will always be less
so we overwrite it's value
with the value of max_counter'''
if min_counter < max_counter:
min_counter = max_counter
'''at last iteration max counter will be less than min counter
so we overwrite it'''
if max_counter < min_counter:
min_counter = max_counter
else:
max_counter = 0
return min_counter
x = ['t','r','r','r','t','t','r','r','t','t','t','r','t']
y = 'r'
print(finder(x,y))

Create a string from list and then search for pattern required and then count r in the found matches and then take min of it
Code:
import re
lst = ['t','r','r','r','t','t','r','r','t']
text = ''.join(lst)
pattern = '(?<=t)r+(?=t)'
smallest_r_seq = min(match.group().count('r') for match in re.finditer(pattern, text))
print(smallest_r_seq)
Output:
2

How to optimize an O(N*M) to be O(n**2)?

I am trying to solve USACO's Milking Cows problem. The problem statement is here: https://train.usaco.org/usacoprob2?S=milk2&a=n3lMlotUxJ1
Given a series of intervals in the form of a 2d array, I have to find the longest interval and the longest interval in which no milking was occurring.
Ex. Given the array [[500,1200],[200,900],[100,1200]], the longest interval would be 1100 as there is continuous milking and the longest interval without milking would be 0 as there are no rest periods.
I have tried looking at whether utilizing a dictionary would decrease run times but I haven't had much success.
f = open('milk2.in', 'r')
w = open('milk2.out', 'w')
#getting the input
farmers = int(f.readline().strip())
schedule = []
for i in range(farmers):
schedule.append(f.readline().strip().split())
#schedule = data
minvalue = 0
maxvalue = 0
#getting the minimums and maximums of the data
for time in range(farmers):
schedule[time][0] = int(schedule[time][0])
schedule[time][1] = int(schedule[time][1])
if (minvalue == 0):
minvalue = schedule[time][0]
if (maxvalue == 0):
maxvalue = schedule[time][1]
minvalue = min(schedule[time][0], minvalue)
maxvalue = max(schedule[time][1], maxvalue)
filled_thistime = 0
filled_max = 0
empty_max = 0
empty_thistime = 0
#goes through all the possible items in between the minimum and the maximum
for point in range(minvalue, maxvalue):
isfilled = False
#goes through all the data for each point value in order to find the best values
for check in range(farmers):
if point >= schedule[check][0] and point < schedule[check][1]:
filled_thistime += 1
empty_thistime = 0
isfilled = True
break
if isfilled == False:
filled_thistime = 0
empty_thistime += 1
if (filled_max < filled_thistime) :
filled_max = filled_thistime
if (empty_max < empty_thistime) :
empty_max = empty_thistime
print(filled_max)
print(empty_max)
if (filled_max < filled_thistime):
filled_max = filled_thistime
w.write(str(filled_max) + " " + str(empty_max) + "\n")
f.close()
w.close()
The program works fine, but I need to decrease the time it takes to run.

A less pretty but more efficient approach would be to solve this like a free list, though it is a bit more tricky since the ranges can overlap. This method only requires looping through the input list a single time.
def insert(start, end):
for existing in times:
existing_start, existing_end = existing
# New time is a subset of existing time
if start >= existing_start and end <= existing_end:
return
# New time ends during existing time
elif end >= existing_start and end <= existing_end:
times.remove(existing)
return insert(start, existing_end)
# New time starts during existing time
elif start >= existing_start and start <= existing_end:
# existing[1] = max(existing_end, end)
times.remove(existing)
return insert(existing_start, end)
# New time is superset of existing time
elif start <= existing_start and end >= existing_end:
times.remove(existing)
return insert(start, end)
times.append([start, end])
data = [
[500,1200],
[200,900],
[100,1200]
]
times = [data[0]]
for start, end in data[1:]:
insert(start, end)
longest_milk = 0
longest_gap = 0
for i, time in enumerate(times):
duration = time[1] - time[0]
if duration > longest_milk:
longest_milk = duration
if i != len(times) - 1 and times[i+1][0] - times[i][1] > longest_gap:
longes_gap = times[i+1][0] - times[i][1]
print(longest_milk, longest_gap)

As stated in the comments, if the input is sorted, the complexity could be O(n), if that's not the case we need to sort it first and the complexity is O(nlog n):
lst = [ [300,1000],
[700,1200],
[1500,2100] ]
from itertools import groupby
longest_milking = 0
longest_idle = 0
l = sorted(lst, key=lambda k: k[0])
for v, g in groupby(zip(l[::1], l[1::1]), lambda k: k[1][0] <= k[0][1]):
l = [*g][0]
if v:
mn, mx = min(i[0] for i in l), max(i[1] for i in l)
if mx-mn > longest_milking:
longest_milking = mx-mn
else:
mx = max((i2[0] - i1[1] for i1, i2 in zip(l[::1], l[1::1])))
if mx > longest_idle:
longest_idle = mx
# corner case, N=1 (only one interval)
if len(lst) == 1:
longest_milking = lst[0][1] - lst[0][0]
print(longest_milking)
print(longest_idle)
Prints:
900
300
For input:
lst = [ [500,1200],
[200,900],
[100,1200] ]
Prints:
1100
0

Determining Longest run of Heads and Tails

I have a question about my fourth function, LongestRun. I want to output what the longest run of heads was and the longest run of tails based on how many flips (n) the user enters. I have tried a ton of different things, and it doesn't seem to work. Can you guys help me out?:
def LongestRun(n):
H = 0
T = 1
myList = []
for i in range(n):
random.randint(0,1)
if random.randint(0,1) == H:
myList.append('H')
else:
myList.append('T')
I want this next piece to output two things.
"The longest run of heads was: " And then whatever the longest run
of heads was.
"The longest run of tails was: " and whatever the longest run of
tails was.
Please help me! Thank you guys!

from itertools import groupby
my_list = [1,1,0,0,0,1,1,1,1,0,0,1,1,1,1,1,1,0,1]
max(len(list(v)) for k,v in groupby(my_list) if k==1)
is a fun way to group consecutive values and then counts the longest length of 1's, if you were to use "H/T" instead just change the if condition at the end

I guess there is a way with higher performance than my solution, but it gets you what you want:
You can do the following also with lists instead of np.arrays:
import numpy as np
n = 100
choices = ['H', 'T']
HT_array = np.random.choice(choices, n) # creates a n dimensional array with random entries of H and T
max_h = 0
max_t = 0
count_h = 0
count_t = 0
for item in HT_array:
if item == 'H':
count_h += 1
count_t = 0
if count_h > max_h:
max_h = count_h
elif item == 'T':
count_t += 1
count_h = 0
if count_t > max_t:
max_t = count_t
print(max_t)
print(max_h)

My not so optimized version:
def LongestRun(myList, lookFor='H'):
current_longest = 0
max_longest = 0
for x in myList:
if x == lookFor:
current_longest+=1
if current_longest > max_longest:
max_longest = current_longest
else:
current_longest=0
return max_longest
myList = 'H H H H H T H T H T T H T T T T T T H H H H H H H H H H H H H T'.split()
print LongestRun(myList)
print LongestRun(myList, 'T')

As #khuderm suggested, one solution is to have a counter that keep track of of the current run of heads or tails and two variables that keep track of the max run for each one.
Here's what the process should look like:
Initialize counter, max_H and max_T to zero,
Each time you append a 'H' or 'T', increment counter by 1
After incrementing counter, if corresponding max is less than counter, update max to the value of counter.
Finally, if the previous flip was a 'H' and now its a 'T' or vice vera, reset counter to zero.

Keep track of the longest sequence as you go, resetting each sequence after comparing the current to the longest sequence:
def LongestRun(n):
my_t, my_h = 0, 0
long_h, long_t = 0, 0
for i in range(n):
if not random.randint(0, 1):
my_h += 1
# if we have heads, check current len of tails seq and reset
if my_t > long_t:
long_t = my_t
my_t = 0
else:
# else we have tails, check current len of heads seq and reset
my_t += 1
if my_h > long_h:
long_h = my_h
my_h = 0
print("Longest run of heads was {}\nLongest run of tails was {}".format(long_h, long_t))
Output:
In [4]: LongestRun(1000)
Longest run of heads was 11
Longest run of tails was 13
In [5]: LongestRun(1000)
Longest run of heads was 7
Longest run of tails was 10
In [6]: LongestRun(1000)
Longest run of heads was 13
Longest run of tails was 8

leading number groups between two numbers

(Python) Given two numbers A and B. I need to find all nested "groups" of numbers:
range(2169800, 2171194)
leading numbers: 21698XX, 21699XX, 2170XX, 21710XX, 217110X, 217111X,
217112X, 217113X, 217114X, 217115X, 217116X, 217117X, 217118X, 2171190X,
2171191X, 2171192X, 2171193X, 2171194X
or like this:
range(1000, 1452)
leading numbers: 10XX, 11XX, 12XX, 13XX, 140X, 141X, 142X, 143X,
144X, 1450, 1451, 1452

Harder than it first looked - pretty sure this is solid and will handle most boundary conditions. :) (There are few!!)
def leading(a, b):
# generate digit pairs a=123, b=456 -> [(1, 4), (2, 5), (3, 6)]
zip_digits = zip(str(a), str(b))
zip_digits = map(lambda (x,y):(int(x), int(y)), zip_digits)
# this ignores problems where the last matching digits are 0 and 9
# leading (12000, 12999) is same as leading(12, 12)
while(zip_digits[-1] == (0,9)):
zip_digits.pop()
# start recursion
return compute_leading(zip_digits)
def compute_leading(zip_digits):
if(len(zip_digits) == 1): # 1 digit case is simple!! :)
(a,b) = zip_digits.pop()
return range(a, b+1)
#now we partition the problem
# given leading(123,456) we decompose this into 3 problems
# lows -> leading(123,129)
# middle -> leading(130,449) which we can recurse to leading(13,44)
# highs -> leading(450,456)
last_digits = zip_digits.pop()
low_prefix = reduce(lambda x, y : 10 * x + y, [tup[0] for tup in zip_digits]) * 10 # base for lows e.g. 120
high_prefix = reduce(lambda x, y : 10 * x + y, [tup[1] for tup in zip_digits]) * 10 # base for highs e.g. 450
lows = range(low_prefix + last_digits[0], low_prefix + 10)
highs = range(high_prefix + 0, high_prefix + last_digits[1] + 1)
#check for boundary cases where lows or highs have all ten digits
(a,b) = zip_digits.pop() # pop last digits of middle so they can be adjusted
if len(lows) == 10:
lows = []
else:
a = a + 1
if len(highs) == 10:
highs = []
else:
b = b - 1
zip_digits.append((a,b)) # push back last digits of middle after adjustments
return lows + compute_leading(zip_digits) + highs # and recurse - woohoo!!
print leading(199,411)
print leading(2169800, 2171194)
print leading(1000, 1452)

def foo(start, end):
index = 0
is_lower = False
while index < len(start):
if is_lower and start[index] == '0':
break
if not is_lower and start[index] < end[index]:
first_lower = index
is_lower = True
index += 1
return index-1, first_lower
start = '2169800'
end = '2171194'
result = []
while int(start) < int(end):
index, first_lower = foo(start, end)
range_end = index > first_lower and 10 or int(end[first_lower])
for x in range(int(start[index]), range_end):
result.append(start[:index] + str(x) + 'X'*(len(start)-index-1))
if range_end == 10:
start = str(int(start[:index])+1)+'0'+start[index+1:]
else:
start = start[:index] + str(range_end) + start[index+1:]
result.append(end)
print "Leading numbers:"
print result
I test the examples you've given, it is right. Hope this will help you

This should give you a good starting point :
def leading(start, end):
leading = []
hundreds = start // 100
while (end - hundreds * 100) > 100:
i = hundreds * 100
leading.append(range(i,i+100))
hundreds += 1
c = hundreds * 100
tens = 1
while (end - c - tens * 10) > 10:
i = c + tens * 10
leading.append(range(i, i + 10))
tens += 1
c += tens * 10
ones = 1
while (end - c - ones) > 0:
i = c + ones
leading.append(i)
ones += 1
leading.append(end)
return leading
Ok, the whole could be one loop-level deeper. But I thought it might be clearer this way. Hope, this helps you...
Update :
Now I see what you want. Furthermore, maria's code doesn't seem to be working for me. (Sorry...)
So please consider the following code :
def leading(start, end):
depth = 2
while 10 ** depth > end : depth -=1
leading = []
const = 0
coeff = start // 10 ** depth
while depth >= 0:
while (end - const - coeff * 10 ** depth) >= 10 ** depth:
leading.append(str(const / 10 ** depth + coeff) + "X" * depth)
coeff += 1
const += coeff * 10 ** depth
coeff = 0
depth -= 1
leading.append(end)
return leading
print leading(199,411)
print leading(2169800, 2171194)
print leading(1000, 1453)
print leading(1,12)
Now, let me try to explain the approach here.
The algorithm will try to find "end" starting from value "start" and check whether "end" is in the next 10^2 (which is 100 in this case). If it fails, it will make a leap of 10^2 until it succeeds. When it succeeds it will go one depth level lower. That is, it will make leaps one order of magnitude smaller. And loop that way until the depth is equal to zero (= leaps of 10^0 = 1). The algorithm stops when it reaches the "end" value.
You may also notice that I have the implemented the wrapping loop I mentioned so it is now possible to define the starting depth (or leap size) in a variable.
The first while loop makes sure the first leap does not overshoot the "end" value.
If you have any questions, just feel free to ask.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python complete search in one pass function - python

Related

Python if statement with multiple condition is messing up?

How do I get the shortest repitition of something in an array?

How to optimize an O(N*M) to be O(n**2)?

Determining Longest run of Heads and Tails

leading number groups between two numbers

Categories

Resources