Grouping Timelapses by time difference algorithm - python

I'm trying to write a program that groups Timelapse photo's together from their timestamp. The timelapse photo's and random photo's are in one folder.
For example, if the timestamp difference in seconds between the previous and current photo is: 346, 850, 13, 14, 13, 14, 15, 12, 12, 13, 16, 11, 438.
You can make a reasonable guess the timelapse began at 13 and ended at 11.
Right now I'm trying a hacky solution to compare the percentage difference with the previous one.
But there has to be a formula/algo to group timestamps together by timedifference. Rolling mean or something.
Am I looking over a simple solution?
Thank you!
def cat_algo(folder):
# Get a list with all the CR2 files in the folder we are processing
file_list = folder_to_file_list(folder)
# Extract the timestamp out of the CR2 file into a sorted dictionary
cr2_timestamp = collections.OrderedDict()
for file in file_list:
cr2_timestamp[file] = return_date_from_raw(file)
print str(file) + " - METADATA TIMESTAMP: " + \
str(return_date_from_raw(file))
# Loop over the dictionary to compare the timestamps and create a new dictionary with a suspected group number per shot
# Make sure we know that there is no first file yet using this (can be refractored)
item_count = 1
group_count = 0
cr2_category = collections.OrderedDict()
# get item and the next item out of the sorted dictionary
for item, nextitem in zip(cr2_timestamp.items(), cr2_timestamp.items()[1::]):
# if not the first CR2 file
if item_count >= 2:
current_date_stamp = item[1]
next_date_stamp = nextitem[1]
delta_previous = current_date_stamp - previous_date_stamp
delta_next = next_date_stamp - current_date_stamp
try:
difference_score = int(delta_next.total_seconds() /
delta_previous.total_seconds() * 100)
print "diffscore: " + str(difference_score)
except ZeroDivisionError:
print "zde"
if delta_previous > datetime.timedelta(minutes=5):
# if difference_score < 20:
print item[0] + " - hit - " + str(delta_previous)
group_count += 1
cr2_category[item[0]] = group_count
else:
cr2_category[item[0]] = group_count
# create a algo to come up with percentage difference and use this to label timelapses.
print int(delta_previous.total_seconds())
print int(delta_next.total_seconds())
# Calculations done, make the current date stamp the previous datestamp for the next iteration
previous_date_stamp = current_date_stamp
# If time difference with previous over X make a dict with name:number, in the end everything which has the
# same number 5+ times in a row can be assumed as a timelapse.
else:
# If it is the first date stamp, assign it the current one to be used in the next loop
previous_date_stamp = item[1]
# To help make sure this is not the first image in the sequence.
item_count += 1
print cr2_category

If you use itertools.groupby, using a function that returns True if the delay meets your criteria for timelapse photo regions, based on the list of delays, you can get the index of each such region. Basically, we're grouping on the True/False output of that function.
from itertools import groupby
# time differences given in original post
data = [346, 850, 13, 14, 13, 14, 15, 12, 12, 13, 16, 11, 438]
MAX_DELAY = 25 # timelapse regions will have a delay no larger than this
MIN_LENGTH = 3 # timelapse regions will have at least this many photos
index = 0
for timelapse, g in groupby(data, lambda x: x <= MAX_DELAY):
length = len(list(g))
if (timelapse and length > MIN_LENGTH):
print ('timelapse index {}, length {}'.format(index, length))
index += length
output:
timelapse index 2, length 10

Related

Binary search in Python results in an infinite loop

list = [27 , 39 , 56, 73, 3, 43, 15, 98, 21 , 84]
found = False
searchFailed = False
first = 0
last = len(list) - 1
searchValue = int(input("Which number are you looking for? "))
while not found and not searchFailed:
mid = (first + last) // 2
if list[mid] == searchValue:
found = True
else:
if first >= last :
searchFailed = True
else:
if list[mid] > searchValue:
last = mid - 1
else:
last = mid + 1
if found:
print("Your number was found at location", mid)
else:
print("The number does not exist within the list")
The code runs properly when I execute it while searching for 27 (the first number), but any other number just results in an infinite loop.
I believe the loop runs smoothly on the first iteration since if I change the value of first to 1, the code correctly finds the position of 39 but repeats the infinite loop error with all the other numbers after that (while 27 "does not exist within the loop" which makes sense). So I suppose the value of mid is not getting updated properly.
Several points to cover here. First, a binary search needs sorted data in order to work. As your list is not sorted, weirdness and hilarity may ensue :-)
Consider, for example, the unsorted [27 , 39 , 56, 73, 3, 43, 15, 98, 21] when you're looking for 39.
The first midpoint is at value 3 so a binary search will discard the left half entirely (including the 3) since it expects 39to be to the right of that3. Hence it will never find 39`, despite the fact it's in the list.
If your list is unsorted, you're basically stuck with a sequential search.
Second, you should be changing first or last depending on the comparison. You change last in both cases, which won't end well.
Third, it's not usually a good idea to use standard data type names or functions as variable names. Because Python treats classes and functions as first-class objects, you can get into a situation where your bindings break things:
>>> a_tuple = (1, 2) ; a_tuple
(1, 2)
>>> list(a_tuple) # Works.
[1, 2]
>>> list = list(a_tuple) ; list # Works, unintended consequences.
[1, 2]
>>> another_list = list(a_tuple) # No longer works.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not callable
Covering those issues, your code would look something like this (slightly reorganised in the process):
my_list = [3, 15, 21, 27, 39, 43, 56, 73, 84, 98]
found = False
first, last = 0, len(my_list) - 1
searchValue = int(input("Which number are you looking for? "))
while not found:
if first > last:
break
mid = (first + last) // 2
if my_list[mid] == searchValue:
found = True
else:
if my_list[mid] > searchValue:
last = mid - 1
else:
first = mid + 1
if found:
print("Your number was found at location", mid)
else:
print("The number does not exist within the list")
That works, according to the following transcript:
pax> for i in {1..6}; do echo; python prog.py; done
Which number are you looking for? 3
Your number was found at location 0
Which number are you looking for? 39
Your number was found at location 4
Which number are you looking for? 98
Your number was found at location 9
Which number are you looking for? 1
The number does not exist within the list
Which number are you looking for? 40
The number does not exist within the list
Which number are you looking for? 99
The number does not exist within the list
First of all, do not use any reserved word (here list) to name your variables. Secondly, you have a logical error in the following lines:
if list[mid] > searchValue:
last = mid - 1
else:
last = mid + 1
In the last line of the above snippet, it should be first = mid + 1
There are very good answers to this question, also you can consider this simpler version adapted to your case:
my_list = [3, 15, 21, 27, 39, 43, 56, 73, 84, 98] # sorted!
left, right = 0, len(my_list) # [left, right)
search_value = int(input("Which number are you looking for? "))
while left + 1 < right:
mid = (left + right) // 2
if my_list[mid] <= search_value:
left = mid
else:
right = mid
if my_list[left] == search_value: # found!
print("Your number was found at location", left)
else:
print("The number does not exist within the list")
The problem with your function is that in Binary Search the array or the list needs to be SORTED because it's one of the most important principal of binary search, i made same function working correctly for you
#low is the first index and high is the last index, val is the value to find, list_ is the list, you can leave low as it is
def binary_search(list_: list, val,high: int, low: int = 0):
mid = (low+high)//2
if list_[mid] == val:
return mid
elif list_[mid] <= val:
return binary_search(list_, val, high+1)
elif list_[mid] >= val:
return binary_search(list_, val, high, low-1)
and now here's the output
>>> binary_search(list_, 21, len(list_)-1)
>>> 2
what will happen here is that first it will calculate the middle index of the list, which i think of your list is 5, then it will check whether the middle value is equal to the value given to search, and then return the mid index, and if the mid index is smaller than the value, then we will tell it to add one to high index, and we did the comparison with index and value because as i told you, list needs to be sorted and this means if index is greater or equal to the mid index then the value is also surely greater than the middle value, so now what we will do is that we will call the same function again but this time with a higher high which will increase the mid point and if this time middle index is equal to value, then its gonna return the value and going to do this untill mid is equal to value, and in the last elif it says if middle value is greater than value, we will call same function again but lower the low i.e which is 0 and now -1, which will reduce the mid point and this whole process will continue untill mid is equal to value

Python - Count number of date/time iterations between 2 dates using freq

I am trying to figure out how to calculate number of iterations of datetimes between 2 dates using a specific frequency (1D 3D 3H 15T)
for example:
freq = '3H'
start = datetime.datetime(2018, 8, 14, 9, 0, 0)
end= datetime.datetime(2018, 8, 15)
total = func(start, end, freq)
if freq = '3H' total would be 5.
if freq = '30T' total would be 30
what would func look like?
EDIT
I'm leaving my original question up there, and adding details I failed to add originally in order to keep things as simple as possible.
In the code I am working on, I have a Pandas DataFrame with a DateTimeIndex. I needed to calculate the number of rows since a specific time(above). I thought about creating a DataFrame starting from that time and filling in the gaps, and counting rows like that, but that seems silly now.
the function I ended up using (with the parsing) is this:
def periods(row, time_frame):
start = datetime.datetime(2018, 8, 14, 9, 0, 0)
end = row.name
t = time_frame[-1:]
n = int(re.findall('\d+',time_frame)[0])
if t is 'H':
freq = datetime.timedelta(hours=n)
elif t is 'T':
freq = datetime.timedelta(minutes=n)
else:
freq = datetime.timedelta(days=n)
count = 0
while start < end:
start += freq
count += 1
return count
and I call it from my dataframe(candlesticks) like this:
candlesticks['n'] = candlesticks.apply(lambda x: periods(x, candlesticks.index.freqstr), axis=1)
Use the timedelta module in the datetime library, and from there it's the same as comparing numbers essentially.
from datetime import timedelta
freq = timedelta(hours=3)
def periods(frequency, start, end):
count = 0
while start < end:
start += frequency
count += 1
return count
p = periods(freq, start, end)
print(p)
>> 5

Use pandas to count value greater than previous value

I am trying to count the number of times a value is greater than the previous value by 2.
I have tried
df['new'] = df.ms.gt(df.ms.shift())
and other similar lines but none give me what I need.
might be less than elegant but:
df['new_ms'] = df['ms'].shift(-1)
df['new'] = np.where((df['ms'] - df['new_ms']) >= 2, 1, 0)
df['new'].sum()
Are you looking for diff? Find the difference between consecutive values and check that their difference is greater than, or equal to 2, then count rows that are True:
(df.ms.diff() >= 2).sum()
If you need to check if the difference is exactly 2, then change >= to ==:
(df.ms.diff() == 2).sum()
Since you need a specific difference, gt won't work. You could simply subtract and see if the difference is bigger than 2:
(df.ms - df.ms.shift() > 2).sum()
edit: changed to get you your answer instead of creating a new column. sum works here because it converts booleans to 1 and 0.
your question was ambiguous but as you wanted to see a program where number of times a value is greater than the previous value by 2 in pandas.here it is :
import pandas as pd
lst2 = [11, 13, 15, 35, 55, 66, 68] #list of int
dataframe = pd.DataFrame(list(lst2)) #converting into dataframe
count = 0 #we will count how many time n+1 is greater than n by 2
d = dataframe[0][0] #storing first index value to d
for i in range(len(dataframe)):
#print(dataframe[0][i])
d = d+2 #incrementing d by 2 to check if it is equal to the next index value
if(d == dataframe[0][i]):
count = count+1 #if n is less than n+1 by 2 then keep counting
d = dataframe[0][i] #update index
print("total count ",count) #printing how many times n was less than n+1 by 2

subtract n values from input python

I haven't found anything even relevant to my question, so i may be asking it wrong.
I am working on an exercise where I am given sequential values starting at 1 and going to n, but not in order. I must find a missing value from the list.
My method is to add the full 1 => n value in a for loop but I can't figure out how to add n - 1 non-sequential values each as its own line of input in order to subtract it from the full value to get the missing one.
I have been searching modifications to for loops or just how to add n inputs of non-sequential numbers. If I am simply asking the wrong question, I am happy to do my own research if someone could point me in the right direction.
total = 0
for i in range (1 , (int(input())) + 1):
total += i
print(total)
for s in **?????(int(input()))**:
total -= s
print(total)
sample input:
5
3
2
5
1
expected output: 4
To fill in the approach you're using in your example code:
total = 0
n = int(input("How long is the sequence? "))
for i in range(1, n+1):
total += i
for i in range(1, n):
total -= int(input("Enter value {}: ".format(i)))
print("Missing value is: " + str(total))
That first for loop is unnecessary though. First of all, your loop is equivalent to the sum function:
total = sum(range(1,n+1))
But you can do away with any iteration altogether by using the formula:
total = int(n*(n+1)/2) # division causes float output so you have to convert back to an int
I don't know if you are supposed to create the initial data (with the missing item), so I added some lines to generate this sequence:
import random
n = 12 # or n = int(input('Enter n: ')) to get user input
# create a shuffled numeric sequence with one missing value
data = list(range(1,n+1))
data.remove(random.randrange(1,n+1))
random.shuffle(data)
print(data)
# create the corresponding reference sequence (without missing value)
data2 = list(range(1,n+1))
# find missing data with your algorithm
print("Missing value =", sum(data2)-sum(data))
Here is the output:
[12, 4, 11, 5, 2, 7, 1, 6, 8, 9, 10]
Missing value = 3

Binary search to find last element in sorted list that is less then specific value

I am searching through a dictionary of messages, that contain unixtimes, with length N, where I want to find maximum number of messages (I call this the frequency) that is inside an arbitrary 24 hour (86400 seconds) time slot. That means that if there are five messages with an unixtime within 24 hours of one I want 5.
I want to accomplish this with binary search, but I am a little bit in the wild on how I can implement that as best, and if I can use some binarysearch library.
This is how I do it with a search grid of 10 elements:
cur.execute('SELECT unixtime FROM MessageType1 WHERE userID ='+str(userID[index])+' ORDER BY unixtime asc')
AISmessages = cur.fetchall()
AISmessages = {index:x[0] for index,x in enumerate(AISmessages)}
for nextMessageIndex in range(messageIndex+1, len(AISmessages),10):
if AISmessages[nextMessageIndex] < message+(86400):
#Count the number of occurences
frequency += 10
elif AISmessages[nextMessageIndex-5] < message+(86400):
if AISmessages[nextMessageIndex-2] < message+(86400):
if AISmessages[nextMessageIndex-1] < message+(86400):
frequency += 9
else:
frequency += 8
elif AISmessages[nextMessageIndex-3] < message+(86400):
frequency += 7
elif AISmessages[nextMessageIndex-4] < message+(86400):
frequency += 6
else:
frequency += 5
elif AISmessages[nextMessageIndex-7] < message+(86400):
if AISmessages[nextMessageIndex-6] < mssage+(86400):
frequency += 4
else:
frequency += 3
elif AISmessages[nextMessageIndex-9] < message+(86400):
if AISmessages[nextMessageIndex-8]< message+(86400):
frequency += 2
else:
frequency += 1
else:
break
I think I've screwed up this one as well, but I cannot find out how - I know it is no good when the length of AISmessages isnt divisible by 10 f.ex
How would I standarize this to a binary search that gives me the frequency of the messages inside a 24 hour timeslot in a dictionary with any number of elements?
You can use bisect from the standard library. I'm not sure if I understood your problem correctly, but a solution may look something like this:
frequency = bisect(AISmessages[messageIndex:], message+86400)
Example: This gives you the number of items in the list a with values in a range of 30, starting from the entry with index 2 (assuming a is sorted):
>>> a = [4, 17, 31, 39, 41, 80, 82, 85, 86, 96]
>>> i = 2
>>> m = a[i] # 31
>>> bisect(a[i:], m+30)
3 # correct: 31, 39, 41

Categories

Resources