Subtract or add time to web-scraped times - python

I'm working on a one-off script for myself to get sunset times for Friday and Saturday, in order to determine when Shabbat and Havdalah start. Now, I was able to scrape the times from timeanddate.com -- using BeautifulSoup -- and store them in a list. Unfortunately, I'm stuck with those times; what I would like to do is be able to subtract or add time to them. As Shabbat candle-lighting time is 18 minutes before sunset, I'd like to be able to take the given sunset time for Friday and subtract 18 minutes from it. Here is the code I have thus far:
import datetime
import requests
from BeautifulSoup import BeautifulSoup
# declare all the things here...
day = datetime.date.today().day
month = datetime.date.today().month
year = datetime.date.today().year
soup = BeautifulSoup(requests.get('http://www.timeanddate.com/worldclock/astronomy.html?n=43').text)
# worry not.
times = []
for row in soup('table',{'class':'spad'})[0].tbody('tr'):
tds = row('td')
times.append(tds[1].string)
#end for
shabbat_sunset = times[0]
havdalah_time = times[1]
So far, I'm stuck. The objects in times[] are shown to be BeautifulSoup NavigatableStrings, which I can't modify into ints (for obvious reasons). Any help would be appreciated, and thank you sososososo much.
EDIT
So, I used the suggestion of using mktime and making BeautifulSoup's string into a regular string. Now I'm getting an OverflowError: mktime out of range when I call mktime on shabbat...
for row in soup('table',{'class':'spad'})[0].tbody('tr'):
tds = row('td')
sunsetStr = "%s" % tds[2].text
sunsetTime = strptime(sunsetStr,"%H:%M")
shabbat = mktime(sunsetTime)
candlelighting = mktime(sunsetTime) - 18 * 60
havdalah = mktime(sunsetTime) + delta * 60

You should use the datetime.timedelta() function.
In example:
time_you_want = datetime.datetime.now() + datetime.timedelta(minutes = 18)
Also see here:
Python Create unix timestamp five minutes in the future
Shalom Shabbat

The approach I'd take is to parse the complete time into a normal representation - in Python world, this representation is the number of seconds since the Unix epoch, 1 Jan 1970 midnight. To do this, you also need to look at column 0. (Incidentally, tds[1] is the sunrise time, not what I think you want.)
See below:
#!/usr/bin/env python
import requests
from BeautifulSoup import BeautifulSoup
from time import mktime, strptime, asctime, localtime
soup = BeautifulSoup(requests.get('http://www.timeanddate.com/worldclock/astronomy.html?n=43').text)
# worry not.
(shabbat, havdalah) = (None, None)
for row in soup('table',{'class':'spad'})[0].tbody('tr'):
tds = row('td')
sunsetStr = "%s %s" % (tds[0].text, tds[2].text)
sunsetTime = strptime(sunsetStr, "%b %d, %Y %I:%M %p")
if sunsetTime.tm_wday == 4: # Friday
shabbat = mktime(sunsetTime) - 18 * 60
elif sunsetTime.tm_wday == 5: # Saturday
havdalah = mktime(sunsetTime)
print "Shabbat - 18 Minutes: %s" % asctime(localtime(shabbat))
print "Havdalah %s" % asctime(localtime(havdalah))
Second, help to help yourself: The 'tds' list is a list of BeautifulSoup.Tag. To get documentation on this object, open a Python terminal, type
import BeautifulSoup
help(BeautifulSoup.Tag)

Related

How to find the difference between two times

I'm trying to figure out a way to take two times from the same day and figure out the difference between them. So far shown in the code below I have converted both of the given times into Int Vars and split the strings to retrieve the information. This works well but when the clock in values minute is higher than the clock out value it proceeds to give a negative value in minute slot of the output.
My current code is:
from datetime import datetime
now = datetime.now()
clocked_in = now.strftime("%H:%M")
clocked_out = '18:10'
def calc_total_hours(clockedin, clockedout):
in_hh, in_mm = map(int, clockedin.split(':'))
out_hh, out_mm = map(int, clockedout.split(':'))
hours = out_hh - in_hh
mins = out_mm - in_mm
return f"{hours}:{mins}"
print(calc_total_hours(clocked_in, clocked_out))
if the clocked in value is 12:30 and the clocked out value is 18:10
the output is:
6:-20
the output needs to be converted back into a stand time format when everything is done H:M:S
Thanks for you assistance and sorry for the lack of quality code. Im still learning! :D
First, in order to fix your code, you need to convert both time to minutes, compute the difference and then convert it back to hours and minutes:
clocked_in = '12:30'
clocked_out = '18:10'
def calc_total_hours(clockedin, clockedout):
in_hh, in_mm = map(int, clockedin.split(':'))
out_hh, out_mm = map(int, clockedout.split(':'))
diff = (in_hh * 60 + in_mm) - (out_hh * 60 + out_mm)
hours, mins = divmod(abs(diff) ,60)
return f"{hours}:{mins}"
print(calc_total_hours(clocked_in, clocked_out))
# 5: 40
Better way to implement the time difference:
import time
import datetime
t1 = datetime.datetime.now()
time.sleep(5)
t2 = datetime.datetime.now()
diff = t2 - t1
print(str(diff))
Output:
#h:mm:ss
0:00:05.013823
Probably the most reliable way is to represent the times a datetime objects, and then take one from the other which will give you a timedelta.
from datetime import datetime
clock_in = datetime.now()
clock_out = clock_in.replace(hour=18, minute=10)
seconds_diff = abs((clock_out - clock_in).total_seconds())
hours, minutes = seconds_diff // 3600, (seconds_diff // 60) % 60
print(f"{hours}:{minutes}")

how to display calculated timedelta as time in python

I'm calculating time stored in timewarrior via timew-report python library.
I'm adding up the time, which I'm able to do. And I'm trying get the total to display in just a number of hours:minutes:seconds, without days.
My script....
#!/usr/bin/python
import sys
import datetime
from datetime import timedelta
from timewreport.parser import TimeWarriorParser #https://github.com/lauft/timew-report
parser = TimeWarriorParser(sys.stdin)
total = datetime.datetime(1, 1, 1, 0, 0)
for interval in parser.get_intervals():
duration = interval.get_duration()
print(duration)
total = total + duration
print(total)
...works properly, returning:
0:01:09
0:06:03
7:00:00
0:12:52
20:00:00
0001-01-02 03:20:04
...but instead of showing 0001-01-02 03:20:04 I'd like it to say 27:20:04.
How do I get it to be formatted like that?
Am I taking the wrong approach by initializing total like datetime.datetime(1, 1, 1, 0, 0)?
On the assumption that interval.get_duration is returning a datetime.timedelta object each time, you can just add these to an existing datetime.timedelta object, and then do the arithmetic to convert to HH:MM:SS format at the end. (You will need to do your own arithmetic because the default string representation for timedelta will use days and HH:MM:SS if the value exceeds 24 hours, which you don't want.)
For example:
import datetime
total = datetime.timedelta(0)
for interval in parser.get_intervals():
duration = interval.get_duration()
total += duration
total_secs = int(total.total_seconds())
secs = total_secs % 60
mins = (total_secs // 60) % 60
hours = (total_secs // 3600)
print("{hours}:{mins:02}:{secs:02}".format(hours=hours, mins=mins, secs=secs))
For anyone interested in this timewarrior report, here's my final working code. Put this in scriptname located in timewarrior extensions directory then invoke like timew scriptname tagname to see a timereport showing annotations and total uninvoiced time for a given tag (it can also be used without a tag to display all time entries).
#!/usr/bin/python
import sys
import datetime
from datetime import timedelta
from timewreport.parser import TimeWarriorParser #https://github.com/lauft/timew-report
parser = TimeWarriorParser(sys.stdin)
total = datetime.timedelta()
tags = ''
for interval in parser.get_intervals():
tags = interval.get_tags()
# this report shows only un-invoiced time, so we ignore "invoiced" time entries
if 'invoiced' not in tags:
# hide 'client' and 'work' tags since they clutter this report
if 'clients' in tags:
tags.remove('clients')
if 'work' in tags:
tags.remove('work')
date = interval.get_start()
duration = interval.get_duration()
ant = interval.get_annotation()
sep = ', '
taglist = sep.join(tags)
output = str(date.date()) + ' - ' + str(duration) + ' - ' + taglist
if ant:
output += ' ||| ' + str(ant)
print(output)
total = total + duration
print('----------------')
# We calculate the time out like this manually because we don't want numbers of hours greater than 24 to be presented as days
total_secs = int(total.total_seconds())
secs = total_secs % 60
mins = (total_secs // 60) % 60
hours = (total_secs // 3600)
# for new versions of python 3.6 and up the following could work
# print(f"{hours}:{mins:02}:{secs:02}")
# but for older python this works...
print("total = {hours}:{mins:02}:{secs:02}".format(hours=hours, mins=mins, secs=secs))
Pass total seconds to timedelta function from datetime like below:
total = your_total.timestamp()
total_time = datetime.timedelta(seconds=total)
str(total_time)

Get the time since a file was last accessed in Python

I'm trying to make a script, that takes a folder as input, and deletes files older than one week.
For some reason, my program does not output expected values.
I used:
os.stat('testFile1.txt').st_mtime
os.stat('testFile1.txt').st_atime
I expected atime to return the time the file was last accessed, and mtime last modification, in seconds.
I get a really high number on both, even though I have just opened a file.
Am I doing something wrong? Should I use another method to get the time?
The number you are getting is a timestamp in Unix format. It represents the number of seconds since the start of the year 1970 in UTC (that's why it's so big).
In order to convert it to something more usable, you can use datetime.fromtimestamp():
from datetime import datetime
filename = "testFile1.txt"
file_stat = os.stat(filename)
last_modification = datetime.fromtimestamp(file_stat.st_mtime)
last_access = datetime.fromtimestamp(file_stat.st_atime)
The time you are getting here is not "since the last change". In order to get the amount of time that has passed since a modification or access, you'll need to subtract the modification / access time from the current time:
current_time = datetime.now()
time_since_last_modification = current_time - last_modification
time_since_last_access = current_time - last_access
The code above results in two timedelta objects. In your application, you will need to convert those to days, which is trivial:
days_since_last_modification = time_since_last_modification.days
days_since_last_access = time_since_last_access.days
Whole code
To summarize, this code:
from datetime import datetime
filename = "testFile1.txt"
file_stat = os.stat(filename)
last_modification = datetime.fromtimestamp(file_stat.st_mtime)
last_access = datetime.fromtimestamp(file_stat.st_atime)
current_time = datetime.now()
time_since_last_modification = current_time - last_modification
time_since_last_access = current_time - last_access
days_since_last_modification = time_since_last_modification.days
days_since_last_access = time_since_last_access.days
msg = "{} was modified {} days ago, with last access {} days ago"
msg = msg.format(filename, days_since_last_modification,
days_since_last_access)
print(msg)
Will output something along the lines:
testFile1.txt was modified 4 days ago, last access was 2 days ago
I believe you are not converting epoch time to datetime
From https://docs.python.org/2/library/stat.html
stat.ST_ATIME - Time of last access
time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(os.stat('/tmp/test.txt').st_atime))
>>>'2018-01-15 14:51:23'
stat.ST_MTIME - Time of last modification
time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(os.stat('/tmp/test.txt').st_mtime))
>>>'2018-01-15 14:51:25'
You can even check metadta changes to a file using stat.ST_CTIME
For clarification:
| st_atime
| time of last access
| st_ctime
| time of last change
| st_mtime
| time of last modification
You are getting timestamps. I guess you want it as datetime instead?
From that you can use timedelta to find out how old a datetime is.
import os
import datetime
datetime.datetime.fromtimestamp(os.stat("test").st_mtime)
datetime.datetime.now() - datetime.datetime.fromtimestamp(os.stat("test").st_mtime)
Gives output:
datetime.datetime(2018, 1, 11, 8, 23, 23, 913330)
datetime.timedelta(4, 7252, 17055)
From input data:
drobban#xps:~/Desktop$ ls -lrt test
-rw-rw-r-- 1 drobban drobban 0 jan 11 08:23 test

Python Google Search: Hits within date range are inaccurate

I've been trying to write code to scrape the number of hits within a certain date range on google. I've done this by inserting the date into the google search query. When I copy and paste the link it produces, it gives me the correct query, but when the code runs it, I keep getting the number of hits for the search without the date range. I'm not sure what I'm doing wrong here.
from bs4 import BeautifulSoup
import requests
import re
from datetime import date, timedelta
day = date.today()
friday = day - timedelta(days=day.weekday() + 3) + timedelta(days=7)
word = "debt"
for n in range(0,32,7):
date_end = friday - timedelta(days=n)
date_beg = date_end - timedelta(days=4)
link_beg = "https://www.google.com/search?q=%s&source=lnt&tbs=cdr%%3A1%%2Ccd_min%%3A" % (word)
link_date = "%s%%2F%s%%2F%s%%2Ccd_max%%3A%s%%2F%s%%2F%s&tbm=&gws_rd=ssl" % (str(date_beg.month),str(date_beg.day),str(date_beg.year),str(date_end.month),str(date_end.day),str(date_end.year))
url = link_beg + link_date
print url,
print "\t",
r = requests.get(url)
soup = BeautifulSoup(r.content)
products = soup.findAll("div", id = "resultStats")
result = str(products[0])
results = re.findall(r'\d+', result)
number = ''.join([str(i) for i in results])
print number
For example, one of the links that is produced is this:
Google Search for "debt" in date range "3/9/2015 to 3/13/2015"
The hits produced should be: 39,700,000
But instead, it spits out: 293,000,000 (which is what just a generic search produces)
Google's date range limited search relies on Julian dates-- i.e. the range must be specified in Julian nomenclature. Perhaps you realized this already.
cute kitties daterange:[some Julian date]-[another Julian date] (without brackets).
There are web pages to convert to Julian, or use the jDate Python script or jday shell script.

Formatting time from Google Calendar API with Python

I am trying to get an easy to read time format to list events from google calendar for the current day. I can pull in the data, but I'm having a problem formatting the data to be just the Hour and minute for both start time and end time.
I want to display the information in an easy to read list, so I want to drop the date and seconds and only display the time in order. I have tried several different methods including slicing and trying to convert into date time with no luck.
date = datetime.datetime.now()
tomorrow = date.today() + datetime.timedelta(days=2)
yesterday = date.today() - datetime.timedelta(days=1)
now = str
data = '{:%Y-%m-%d}'.format(date)
tdata = '{:%Y-%m-%d}'.format(tomorrow)
ydata = '{:%Y-%m-%d}'.format(yesterday)
def DateQuery(calendar_service, start_date=data, end_date=tdata):
print 'Date query for events on Primary Calendar: %s to %s' % (start_date, end_date,)
query = gdata.calendar.service.CalendarEventQuery('default', 'private', 'full')
query.start_min = start_date
query.start_max = end_date
feed = calendar_service.CalendarQuery(query)
for i, an_event in enumerate(feed.entry):
print '\'%s\'' % (an_event.title.text)
for a_when in an_event.when:
dstime = (a_when.start_time,)
detime = (a_when.end_time,)
print '\t\tEnd time: %s' % (dstime)
print '\t\tEnd time: %s' % (detime)
It prints like this
End time: 2013-03-23T04:00:00.000-05:00
and I would prefer it be
End time: 04:00
Using the dateutil module:
>>> import dateutil.parser
>>> dateutil.parser.parse('2013-03-23T04:00:00.000-05:00')
>>> dt = dateutil.parser.parse('2013-03-23T04:00:00.000-05:00')
>>> dt.strftime('%I:%M')
'04:00'
If you don't want to use dateutil, you an also parse the string using the specific format with strptime.

Categories

Resources