Speeding Up Python Code Time

Speeding Up Python Code Time - python

start = time.time()
import csv
f = open('Speed_Test.csv','r+')
coordReader = csv.reader(f, delimiter = ',')
count = -1
successful_trip = 0
trips = 0
for line in coordReader:
successful_single = 0
count += 1
R = interval*0.30
if count == 0:
continue
if 26 < float(line[0]) < 48.7537144 and 26 < float(line[2]) < 48.7537144 and -124.6521017 < float(line[1]) < -68 and -124.6521017 < float(line[3]) < -68:
y2,x2,y1,x1 = convertCoordinates(float(line[0]),float(line[1]),float(line[2]),float(line[3]))
coords_line,interval = main(y1,x1,y2,x2)
for item in coords_line:
loop_count = 0
r = 0
min_dist = 10000
for i in range(len(df)):
dist = math.sqrt((item[1]-df.iloc[i,0])**2 + (item[0]-df.iloc[i,1])**2)
if dist < R:
loop_count += 1
if dist < min_dist:
min_dist = dist
r = i
if loop_count != 0:
successful_single += 1
df.iloc[r,2] += 1
trips += 1
if successful_single == (len(coords_line)):
successful_trip += 1
end = time.time()
print('Percent Successful:',successful_trip/trips)
print((end - start))
I have this code and explaining it would be extremely time consuming but it doesn't run as fast as I need it to in order to be able to compute as much as I'd like. Is there anything anyone sees off the bat that I could do to speed the process up? Any suggestions would be greatly appreciated.
In essence it reads in 2 lat and long coordinates and changes them to a cartesian coordinate and then goes through every coordinate along the path from on origin coordinate to the destination coordinate in certain interval lengths depending on distance. As it is doing this though there is a data frame (df) with 300+ coordinate locations that it checks against each one of the trips intervals and sees if one is within radius R and then stores the shortest on.

Take advantage of any opportunity to break out of a for loop once the result is known. For example, at the end of the for line loop you check to see if successful_single == len(coords_line). But that will happen any time the statement if loop_count != 0 is False, because at that point successful_single will not get incremented; you know that its value will never reach len(coords_line). So you could break out of the for item loop right there - you already know it's not a "successful_trip." There may be other situations like this.

have you considered pooling and running these calculations in parallel ?
https://docs.python.org/2/library/multiprocessing.html
Your code also suggests the variable R,interval might create a dependency and requires a linear solution

Related

Python if statement with multiple condition is messing up?

I'm a beginner with Python. I have a 2-d array called infected that stores values that correspond with the index. This bit of code is messy, but basically what I'm trying to do is simulate an infectious disease spreading over a number of days (T). The individual is infectious for infTime many days and then goes into recovery where they are immune for immTime days. There's also a probability value for whether a node will be infected and a value for how many nodes they will be connected to.
My problem is that I'm also trying to track the number of individuals currently susceptible, infected, or immune, but something is going wrong in the elif statement that is marked "# Messing up in this loop". Currently, the program is running through the statement more times than it should, which is throwing off the variables. If I switch the conditions in the elif statement, the program doesn't go through it and will stay at a very low number of infected individuals the entire time. I'm really stuck and I can't find any reason why it's not working how I want it to.
Code:
# Loop through T days, checking for infected individuals and connecting them to beta num of nodes, possibly infecting
infTime = 5 # Time spent infected before becoming immune
immTime = 20 # Time spent immune before becoming susceptible again
numSus = N - count
day = 0
while day < T:
for a in range(len(infected)):
nextnode = random.randint(0, N-1)
if((infected[a][0] == 1) and (infected[a][3] < infTime)):
num = infected[a][1]
for b in range(num-1):
if((a != nextnode) and (infected[nextnode][0] == 0)):
infected[a][3] += 1
chance = round((random.uniform(0, 1)), 2)
if(infected[nextnode][2] > chance):
infected[nextnode][0] = 1
G.add_edge(a, nextnode)
count += 1
numInf += 1
numSus -= 1
elif((a != nextnode) and (infected[nextnode][0] == 1)):
G.add_edge(a, nextnode)
elif((infected[a][0] == 1) and (infected[a][3] == infTime)): # Messing up in this loop
infected[a][3] = 0
infected[a][4] = 1
numImm += 1
numInf -= 1
G.add_edge(a, nextnode)
elif((infected[a][0] == 0) and (1 < infected[a][4] < immTime)):
infected[a][4] += 1
elif((infected[a][0] == 0) and (infected[a][4] == immTime)):
infected[a][4] = 0
numImm -= 1
numSus =+ 1
day += 1
print("Number of infected on day ", day, ": ", count)

How to calculate distance difference between pixels in tuple over frames?

Good day,
In each iteration step, I have a p1 that describe the location of each person. p1 is a tuple, such that p1 = (x_point, y_point), p1 describes the location of a person in frame i goes.
Based on this article, https://www.pyimagesearch.com/2015/09/21/opencv-track-object-movement/ between line 95 to 109. I am trying to modify the lines in 95 to 109 to measure the distance difference of a person in terms of movement.
The problem can be reproduced as following code, suppose I am getting p1 as each i iteration goes (Originally p1 is the value supplied by SORT Tracking). Since I am dealing with a video with approximately 29 fps as well as multiple objects. Based on following code (inner for loop j), it might provide a false result as following image?
EDIT: It appears to me that inner loop fails to handle multiple objects detection as sample image provided.
Thank you for your time as well.
from collections import deque
from random import randint
import numpy as np
(direction_x, direction_y) = (0, 0)
direction = ""
points_list = deque(maxlen=32)
def sample_of_p1():
return (randint(0, 100),randint(0, 100))
for i in range(100):
p1 = sample_of_p1()
points_list.appendleft(p1)
for j in range(1, len(points_list)):
if(i >= 10):
direction_x = points_list[-10][0] - points_list[j][0]
direction_y = points_list[-10][1] - points_list[j][1]
if np.abs(direction_x) > 0:
dirx = "Right" if np.sign(direction_x) == 1 else "Left"
if np.abs(direction_y) > 0:
diry = "Top" if np.sign(direction_y) == 1 else "Bottom"
if dirx != "" and diry != "":
direction = "{} {}".format(diry, dirx)
else:
direction = dirx if dirx != "" else diry
else:
continue

the code seems to compute correctly but there are some optimizations you can make.
You can put the condition if i >= 10 outside of the loop for j, it is a little optimization but more elegant.
if i >= 10:
for j in range(1, len(points_list)):
//some code
else:
continue
Also, you don't define dirx and diry before the conditions, so you program may throw an exception if you don't move along one axis. In the article, they are initialized at line 109.
Finally, the condition np.abs(direction_x) > 0 seems a bit loose. Usually, when you want to define a movement, you set a minimum value (20 in the article, line 113) to catch a significant movement, and not just a shiver or a negligible movement.
Hope that helps.

How to optimize an O(N*M) to be O(n**2)?

I am trying to solve USACO's Milking Cows problem. The problem statement is here: https://train.usaco.org/usacoprob2?S=milk2&a=n3lMlotUxJ1
Given a series of intervals in the form of a 2d array, I have to find the longest interval and the longest interval in which no milking was occurring.
Ex. Given the array [[500,1200],[200,900],[100,1200]], the longest interval would be 1100 as there is continuous milking and the longest interval without milking would be 0 as there are no rest periods.
I have tried looking at whether utilizing a dictionary would decrease run times but I haven't had much success.
f = open('milk2.in', 'r')
w = open('milk2.out', 'w')
#getting the input
farmers = int(f.readline().strip())
schedule = []
for i in range(farmers):
schedule.append(f.readline().strip().split())
#schedule = data
minvalue = 0
maxvalue = 0
#getting the minimums and maximums of the data
for time in range(farmers):
schedule[time][0] = int(schedule[time][0])
schedule[time][1] = int(schedule[time][1])
if (minvalue == 0):
minvalue = schedule[time][0]
if (maxvalue == 0):
maxvalue = schedule[time][1]
minvalue = min(schedule[time][0], minvalue)
maxvalue = max(schedule[time][1], maxvalue)
filled_thistime = 0
filled_max = 0
empty_max = 0
empty_thistime = 0
#goes through all the possible items in between the minimum and the maximum
for point in range(minvalue, maxvalue):
isfilled = False
#goes through all the data for each point value in order to find the best values
for check in range(farmers):
if point >= schedule[check][0] and point < schedule[check][1]:
filled_thistime += 1
empty_thistime = 0
isfilled = True
break
if isfilled == False:
filled_thistime = 0
empty_thistime += 1
if (filled_max < filled_thistime) :
filled_max = filled_thistime
if (empty_max < empty_thistime) :
empty_max = empty_thistime
print(filled_max)
print(empty_max)
if (filled_max < filled_thistime):
filled_max = filled_thistime
w.write(str(filled_max) + " " + str(empty_max) + "\n")
f.close()
w.close()
The program works fine, but I need to decrease the time it takes to run.

A less pretty but more efficient approach would be to solve this like a free list, though it is a bit more tricky since the ranges can overlap. This method only requires looping through the input list a single time.
def insert(start, end):
for existing in times:
existing_start, existing_end = existing
# New time is a subset of existing time
if start >= existing_start and end <= existing_end:
return
# New time ends during existing time
elif end >= existing_start and end <= existing_end:
times.remove(existing)
return insert(start, existing_end)
# New time starts during existing time
elif start >= existing_start and start <= existing_end:
# existing[1] = max(existing_end, end)
times.remove(existing)
return insert(existing_start, end)
# New time is superset of existing time
elif start <= existing_start and end >= existing_end:
times.remove(existing)
return insert(start, end)
times.append([start, end])
data = [
[500,1200],
[200,900],
[100,1200]
]
times = [data[0]]
for start, end in data[1:]:
insert(start, end)
longest_milk = 0
longest_gap = 0
for i, time in enumerate(times):
duration = time[1] - time[0]
if duration > longest_milk:
longest_milk = duration
if i != len(times) - 1 and times[i+1][0] - times[i][1] > longest_gap:
longes_gap = times[i+1][0] - times[i][1]
print(longest_milk, longest_gap)

As stated in the comments, if the input is sorted, the complexity could be O(n), if that's not the case we need to sort it first and the complexity is O(nlog n):
lst = [ [300,1000],
[700,1200],
[1500,2100] ]
from itertools import groupby
longest_milking = 0
longest_idle = 0
l = sorted(lst, key=lambda k: k[0])
for v, g in groupby(zip(l[::1], l[1::1]), lambda k: k[1][0] <= k[0][1]):
l = [*g][0]
if v:
mn, mx = min(i[0] for i in l), max(i[1] for i in l)
if mx-mn > longest_milking:
longest_milking = mx-mn
else:
mx = max((i2[0] - i1[1] for i1, i2 in zip(l[::1], l[1::1])))
if mx > longest_idle:
longest_idle = mx
# corner case, N=1 (only one interval)
if len(lst) == 1:
longest_milking = lst[0][1] - lst[0][0]
print(longest_milking)
print(longest_idle)
Prints:
900
300
For input:
lst = [ [500,1200],
[200,900],
[100,1200] ]
Prints:
1100
0

python (passing parameters to functions)

I'm not really new to python but I came across this problem that has just puzzled me.
So I was solving the maze runner problem, using A* and then was finding the hardest possible maze for a given dimension. For this purpose, I created a function called generateHardMaze() that is called from the main function and takes an attribute newMaze.
Now here is where things get weird, when I change the value of newMaze in the if condition within the while loop the hardMaze value changes without the code entering the second if condition. I'm not really sure why this happening was hoping someone could help me.
I'm using pycharm as my IDE and python3.6.* if that makes any difference.
I'm sure this isn't how oops works but I'm thinking this is a python thing. Has anyone ever come across anything like this? If yes please sympathize.
Thanks in advance.
def solveMazeAManH(newMaze,rows,cols):
startTime = time.time()
backTrackPriority = []
setup_cells(rows, cols)
# start and end points of the maze
start = (0, 0)
end = (rows - 1, cols - 1)
current = start
print("The path to be take is: ")
print(current)
frinLength = 0
# traversing the neighbours
while current != end:
unvisited.remove(current)
neighboursDFSandA(newMaze, current, rows, cols)
heuristic = calManhattanDis(current, end) # finding the heuristic for every traversal
try:
if not currentNeighbours:
if not backTrackPriority:
print("No path available!")
return 0
else:
while not currentNeighbours:
current = nextPopMan(backTrackPriority, end)
backTrackPriority.remove(current)
neighboursDFSandA(newMaze, current, rows, cols)
neighbor = leastPathChildMan(heuristic, current, end)
backTrackPriority.append(current)
current = neighbor
print(current)
frinLength += 1
except:
print("No path Found!")
return 0
return frinLength
endTime = time.time()
print("The time taken to solve the maze using A* with manhattan distance: ")
print(startTime - endTime)
def generateHardMaze(newMazes):
rows = len(newMazes)
cols = len(newMazes[0])
hardMaze = newMaze
print("Solving the original maze!")
fringLength = solveMazeAManH(newMazes, rows, cols)
print("Creating new harder Maze:")
pFlag = True
pCout = 0
while pFlag:
count = 0
flag = True
while flag:
point = choice(setup_cells(rows, cols))
if (newMazes[point[0]][point[1]] == 1):
newMazes[point[0]][point[1]] = 0
else:
newMazes[point[0]][point[1]] = 1
if (fringLength < solveMazeAManH(newMazes, rows, cols)):
print("Harder Maze--------------------")
hardMaze = newMaze
fringLength = solveMazeAManH(newMazes, rows, cols)
count = 0
else:
count += 1
if count >= 10:
flag = False
print("one")
newMazes = creatMaze(rows)
pCout += 1
if pCout >= 100:
pFlag = False
print(hardMaze)

Project Euler Project 67 - Python

I am doing the Project Euler #67 in Python. My program, which worked for Project 18, does not work for Project 67.
Code (excludes the opening of the file and the processing of information):
for i in range(len(temp)):
list1 = temp[i]
try:
list2 = temp[i+1]
trynum1 = list1[lastinput] + max(list2[lastinput],list2[lastinput+1])
try:
trynum2 = list1[lastinput+1] + max(list2[lastinput+1],list2[lastinput+2])
if trynum1 > trynum2:
outputlist.append(list1[lastinput])
else:
outputlist.append(list1[lastinput+1])
lastinput += 1
except IndexError:
outputlist.append(list1[0])
except IndexError:
if list1[lastinput] > list1[lastinput+1]:
outputlist.append(list1[lastinput])
else:
outputlist.append(list1[lastinput+1])
Variables:
temp is the triangle of integers
outputlist is a list which stores the numbers chosen by the program
I know the answer is 7273, but my program finds 6542. I cannot find an error which causes the situation. Please may you help me on it.
Logic
My approach to this program is to find one number (list1[lastinput]) and add it up with the larger number of the two below it (trynum1), compare with the number to the right of the first number (list1[lastinput+1]), adding the larger number of two below it (trynum2). I append the larger one to the output list.

This approach is logically flawed. When you're in row 1, you don't have enough information to know whether moving right or left will lead you to the largest sum, not with only a 2-row lookahead. You would need to look all the way to the bottom to ensure getting the best path.
As others have suggested, start at the bottom and work up. Remember, you don't need the entire path, just the sum. At each node, add the amount of the better of the two available paths (that's the score you get in taking that node to the bottom). When you get back to the top, temp[0][0], that number should be your final answer.

I thought day and night about problem 18 and I solved it, the same way I solved this one.
P.S. 100_triangle.txt is without 1st string '59'.
# Maximum path sum II
import time
def e67():
start = time.time()
f=open("100_triangle.txt")
summ=[59]
for s in f:
slst=s.split()
lst=[int(item) for item in slst]
for i in range(len(lst)):
if i==0:
lst[i]+=summ[i]
elif i==len(lst)-1:
lst[i]+=summ[i-1]
elif (lst[i]+summ[i-1])>(lst[i]+summ[i]):
lst[i]+=summ[i-1]
else:
lst[i]+=summ[i]
summ=lst
end = time.time() - start
print("Runtime =", end)
f.close()
return max(summ)
print(e67()) #7273

Though starting from the bottom is more efficient, I wanted to see if I could implement Dijkstra's algorithm on this one; it works well and only takes a few seconds (didn't time it precisely):
from math import inf
f = open("p067_triangle.txt", "r")
tpyramid = f.read().splitlines()
f.close()
n = len(tpyramid)
pyramid = [[100 - int(tpyramid[i].split()[j]) for j in range(i+1)] for i in range(n)]
paths = [[inf for j in range(i+1)] for i in range(n)]
paths[0][0] = pyramid[0][0]
def mini_index(pyr):
m = inf
for i in range(n):
mr = min([i for i in pyr[i] if i >= 0]+[inf])
if mr < m:
m, a, b = mr, i, pyr[i].index(mr)
return m, a, b
counter = 0
omega = inf
while counter < n*(n+1)/2:
min_weight, i, j = mini_index(paths)
if i != n-1:
paths[i+1][j] = min( paths[i+1][j], min_weight + pyramid[i+1][j])
paths[i+1][j+1] = min( paths[i+1][j+1], min_weight + pyramid[i+1][j+1])
else:
omega = min(omega, min_weight)
paths[i][j] = -1
counter += 1
print(100*n - omega)

Here is my solution. Indeed you have to take the bottom - up approach.
Result confirmed with PE. Thanks!
def get_triangle(listLink):
triangle = [[int(number) for number in row.split()] for row in open(listLink)]
return triangle
listOfLists = get_triangle('D:\\Development\\triangle.txt')
for i in range(len(listOfLists) - 2, -1, -1):
for j in range(len(listOfLists[i])):
listOfLists[i][j] += max(listOfLists[i+1][j], listOfLists[i+1][j+1])
print(listOfLists[0][0])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Speeding Up Python Code Time - python

have you considered pooling and running these calculations in parallel ? https://docs.python.org/2/library/multiprocessing.html Your code also suggests the variable R,interval might create a dependency and requires a linear solution

Related

Python if statement with multiple condition is messing up?

How to calculate distance difference between pixels in tuple over frames?

How to optimize an O(N*M) to be O(n**2)?

python (passing parameters to functions)

Project Euler Project 67 - Python

Categories

Resources