I have a few different lists: scan which is a list of different time stamps and focal which is a list of a few different time stamps repeated, and interval which tells me at what indices the time stamps in focal change. The goal is to calculate the difference of the scan timestamps for each one in focal. I am a little stuck on how to use interval in a while loop for the calculations. Here is what I have so far:
def scanDuration():
scan = scanTimeFMT()
focal = startTimeFMT()
focal.pop(0)
interval = []
times = []
i = 0
while i < len(scan):
whole = scan[i]
time = str(whole[11:])
times.append(time)
i += 1
j = 1
while j < len(focal)-1:
if focal[j] != focal[j+1]:
interval.append(j+1)
j += 1
else:
j += 1
All help is appreciated, thank you!
Related
this has proven to be a challenging task for me so would really appreciate any help:
We have two columns in a data frame: start_time, end_time (both object type hh:mm:ss) which I converted into seconds (float64).
An example of our data (out of 20000 rows):
start_time=["00:01:14", "00:01:15", "00:01:30"]
end_time=["00:01:39", "00:02:25", "00:02:10"]
I am running the following code, but I am not convinced it's correct:
def findMaxPassengers(arrl, exit, n):# define function
arrl.sort() # Sort arrival and exit arrays
exit.sort()
passengers_in = 1
max_passengers = 1
time = arrl[0]
i = 1
j = 0
while (i < n and j < n):
if (arrl[i] <= exit[j]): # if the next event in sorted order is an arrival, then add 1
passengers_in = passengers_in + 1
# Update max_passengers if needed
if(passengers_in > max_passengers):
max_passengers = passengers_in
time = arrl[i]
i = i + 1
else:
passengers_in = passengers_in - 1
j = j + 1
print("Maximum Number of passengers =", max_passengers, "at time", time)
df = pd.read_excel("Venue_Capacity.xlsx")
arrl = list(df.loc[:,"start_time"]);
exit = list(df.loc[:,"end_time"]);
n = len(arrl);
findMaxPassengers(arrl, exit, n);
Is the thinking/code structure behind it correct?
I am not sure if the way the code&time works, if it's adding 1 or subtracting one correctly. The code is running ok and is giving out:
Maximum Number of Passengers = 402 at time 12:12:09
but I am unable to check a dataset of 20000+ rows.
I have a Pandas dataframe with ~100,000,000 rows and 3 columns (Names str, Time int, and Values float), which I compiled from ~500 CSV files using glob.glob(path + '/*.csv').
Given that two different names alternate, the job is to go through the data and count the number of times a value associated with a specific name ABC deviates from its preceding value by ±100, given that the previous 50 values for that name did not deviate by more than ±10.
I initially solved it with a for loop function that iterates through each row, as shown below. It checks for the correct name, then checks the stability of the previous values of that name, and finally adds one to the count if there is a large enough deviation.
count = 0
stabilityTime = 0
i = 0
if names[0] == "ABC":
j = value[0]
stability = np.full(50, values[0])
else:
j = value[1]
stability = np.full(50, values[1])
for name in names:
value = values[i]
if name == "ABC":
if j - 10 < value < j + 10:
stabilityTime += 1
if stabilityTime >= 50 and np.std(stability) < 10:
if value > j + 100 or value < j - 100:
stabilityTime = 0
count += 1
stability = np.roll(stability, -1)
stability[-1] = value
j = value
i += 1
Naturally, this process takes a very long computing time. I have looked at NumPy vectorization, but do not see how I can apply it in this case. Is there some way I can optimize this?
Thank you in advance for any advice!
Bonus points if you can give me a way to concatenate all the data from every CSV file in the directory that is faster than glob.glob(path + '/*.csv').
import random as rd
n = 0
ListOfStreaks = []
ListOfResults = []
while n != 10:
numberOfStreaks = 0
for i in range(100):
Flip = rd.randint(0,1)
ListOfResults.append(Flip)
for i in range(96):
count = 0
for j in range(6):
if ListOfResults[i] == ListOfResults[i + j]:
count += 1
if count == 6:
numberOfStreaks += 1
count = 0
else:
continue
else:
break
ListOfStreaks.append(numberOfStreaks)
n += 1
print(ListOfStreaks)
print(len(ListOfResults))
In the code above, I am able to successfully flip a coin 100 times, and examine how many times in the 100 flips Heads or Tails came up six time in a row. I am unable to properly set up the code to run the experiment 10 times in order to examine how many times Heads or Tails came up six times in a row in each of the single experiments. The goal is to not flip the coins 1,000 times in a row but 10 experiments of flipping 100 coins in a row.
The exercise focuses on later being able to simulate the experiment 10,000 times in order to see what the probability is of Heads or Tails appearing six times in a row in 100 flips. Essentially, I am trying to gather enough of a sample size. While there are actual statistical/probability methods to get the exact answer, that isn't what I am trying to focus on.
CoinFlip Code
Your key problem appears to be that you have ListOfResults = [] outside of your while loop, so each run adds another 100 entries to the list instead of setting up a new test.
I've replaced the initial for loop with a list comprehension which sets up a new sample each time.
import random as rd
list_of_streaks = []
for _ in range(10):
list_of_results = [rd.randint(0,1) for _ in range(100)]
number_of_streaks = 0
for i in range(96):
if sum(list_of_results[i: i+6]) in(0, 6):
number_of_streaks += 1
list_of_streaks.append(number_of_streaks)
print(list_of_streaks)
print(len(list_of_results))
You also don't need the inner for loop to add up all of the 6 flips - you can just sum them to see if the sum is 6 or 0. You appear to have just tested for heads - I tested for 6 identical flips, either heads or tails, but you can adjust that easily enough.
It's also much easier to use a for loop with a range, rather than while with a counter if you are iterating over a set number of iterations.
The first comment from #JonSG is also worth noting. If you had set up the individual test as a function, you'd have been forced to have ListOfResults = [] inside the function, so you would have got a new sample of 100 results each time. Something like:
import random as rd
def run_test():
list_of_results = [rd.randint(0,1) for _ in range(100)]
number_of_streaks = 0
for i in range(96):
if sum(list_of_results[i: i+6]) in(0, 6):
number_of_streaks += 1
return number_of_streaks
print([run_test() for _ in range(10)])
print(len(list_of_results))
start = time.time()
import csv
f = open('Speed_Test.csv','r+')
coordReader = csv.reader(f, delimiter = ',')
count = -1
successful_trip = 0
trips = 0
for line in coordReader:
successful_single = 0
count += 1
R = interval*0.30
if count == 0:
continue
if 26 < float(line[0]) < 48.7537144 and 26 < float(line[2]) < 48.7537144 and -124.6521017 < float(line[1]) < -68 and -124.6521017 < float(line[3]) < -68:
y2,x2,y1,x1 = convertCoordinates(float(line[0]),float(line[1]),float(line[2]),float(line[3]))
coords_line,interval = main(y1,x1,y2,x2)
for item in coords_line:
loop_count = 0
r = 0
min_dist = 10000
for i in range(len(df)):
dist = math.sqrt((item[1]-df.iloc[i,0])**2 + (item[0]-df.iloc[i,1])**2)
if dist < R:
loop_count += 1
if dist < min_dist:
min_dist = dist
r = i
if loop_count != 0:
successful_single += 1
df.iloc[r,2] += 1
trips += 1
if successful_single == (len(coords_line)):
successful_trip += 1
end = time.time()
print('Percent Successful:',successful_trip/trips)
print((end - start))
I have this code and explaining it would be extremely time consuming but it doesn't run as fast as I need it to in order to be able to compute as much as I'd like. Is there anything anyone sees off the bat that I could do to speed the process up? Any suggestions would be greatly appreciated.
In essence it reads in 2 lat and long coordinates and changes them to a cartesian coordinate and then goes through every coordinate along the path from on origin coordinate to the destination coordinate in certain interval lengths depending on distance. As it is doing this though there is a data frame (df) with 300+ coordinate locations that it checks against each one of the trips intervals and sees if one is within radius R and then stores the shortest on.
Take advantage of any opportunity to break out of a for loop once the result is known. For example, at the end of the for line loop you check to see if successful_single == len(coords_line). But that will happen any time the statement if loop_count != 0 is False, because at that point successful_single will not get incremented; you know that its value will never reach len(coords_line). So you could break out of the for item loop right there - you already know it's not a "successful_trip." There may be other situations like this.
have you considered pooling and running these calculations in parallel ?
https://docs.python.org/2/library/multiprocessing.html
Your code also suggests the variable R,interval might create a dependency and requires a linear solution
Think of the Unit Circle x 2. What I have done is create two lists, one for x and one for y, producing 500 pairs of random (x,y). Then I created r=x2+y2 in my while loop, where r is the radius and x2=x**2 and y2=y**2. What I want to be able to do is count the number of times r=<2. I assume my if statement needs to be in the while loop, but I don't know how to actually count the number of times the condition r=<2is met. Do I need to create a list for the r values?
import random
from math import *
def randomgen(N):
rlg1=[]
rlg2=[]
a=random.randint(0,N)
b=float(a)/N
return b
i=0
rlg=[]
rlg2=[]
countlist=[]
while i<500:
x=randomgen(100)*2
y=randomgen(100)*2
x2=x**2
y2=y**2
r=x2+y2
rlg.append(x)
rlg2.append(y)
print rlg[i],rlg2[i]
i+=1
if r<=2:
import random
from math import *
def randomgen(N):
rlg1=[]
rlg2=[]
a=random.randint(0,N)
b=float(a)/N
return b
i=0
rlg=[]
rlg2=[]
countlist=[]
amount = 0
while i<500:
x=randomgen(100)*2
y=randomgen(100)*2
x2=x**2
y2=y**2
r=x2+y2
rlg.append(x)
rlg2.append(y)
print rlg[i],rlg2[i]
i+=1
if r<=2:
amount += 1
You need two counters here. One for the total number of points (i) and one for the number of points that lie within your circle r <= 2 (I'm calling this one isInside). You only want to increment the isInside counter if the point lies within your circle (r <= 2).
i = 0
rlg = []
rlg2 = []
countlist = []
isInside = 0
while i < 500:
x=randomgen(100)*2
y=randomgen(100)*2
x2=x**2
y2=y**2
r=x2+y2
rlg.append(x)
rlg2.append(y)
print rlg[i],rlg2[i]
i+=1
if r <= 2:
# increment your isInside counter
isInside += 1