I'm using Pandas to generate a list of dates and times within a specified range to get query an API. My aim is to query weeks or months on per-hour basis.
time_range = pd.date_range('20180601T07:00:0000', '20180701T07:00:0000', freq='H')
time_range = time_range.strftime("%Y%m%d"+'T%H:00-0000')
yields a list of times in the desired list format. Where I'm encountering difficulty is that the URL is formatted...
startdatetime=20180601T07:00-0000&enddatetime=20180601T08:00-0000
I understand I need to start with values 0 & 1 from the Pandas list but I don't how to cycle through. Should I be thinking of a dictionary like...
{date1:[hour1, hour2, etc...], date2:[hour1, hour2, etc...], ...}
and use a .format where startdatetime={1}&enddatetime={2} ?
or should it be more like a for loop...
for date in date_range:
url = 'http://somename?startdatetime={date}&enddatetime{date2}'
urldate = url.format(date=date)
urldate2 = url.format(date2=date + 1)
Any help is appreciated!
If I understand clearly, you want to iterate from a starting date/time (2018-06-01 07:00) to an ending date/time (2018-07-01 07:00) with a step of one hour. And produce an URL with date/time intervals of one hour.
I don’t know why you use Panda for that when you can do it with the standard library, like that:
import datetime
start = datetime.datetime(2018, 6, 1, 7)
end = datetime.datetime(2018, 7, 1, 7)
delta = datetime.timedelta(hours=1)
fmt = 'http://somename?startdatetime={date1:%Y%m%d%H:00-0000}&enddatetime{date2:%Y%m%d%H:00-0000}'
while start < end:
date1 = start
date2 = start + delta
url = fmt.format(date1=date1, date2=date2)
print(url)
start = date2
You get:
http://somename?startdatetime=2018060107:00-0000&enddatetime2018060108:00-0000
http://somename?startdatetime=2018060108:00-0000&enddatetime2018060109:00-0000
http://somename?startdatetime=2018060109:00-0000&enddatetime2018060110:00-0000
http://somename?startdatetime=2018060110:00-0000&enddatetime2018060111:00-0000
...
In the loop, I work with date instances. I use a format string, like “{date2:%Y%m%d%H:00-0000}” to format the date and time in the required format.
Notice that the date_range() function is easy to implement with the standard library:
def date_range(start, end, delta):
while start < end:
yield start
start = start + delta
To get the list of dates with an interval of one hour, you can do:
dates = list(date_range(
datetime.datetime(2018, 6, 1, 7),
datetime.datetime(2018, 7, 1, 7),
datetime.timedelta(hours=1)))
Then, the solution becomes:
fmt = 'http://somename?startdatetime={date1:%Y%m%d%H:00-0000}&enddatetime{date2:%Y%m%d%H:00-0000}'
for date1, date2 in zip(dates[:-1], dates[1:]):
url = fmt.format(date1=date1, date2=date2)
print(url)
The trick is to use the zip() function with the list of dates shifted with one item to get the couples of dates.
Related
Okay so I am relatively new to programming and this has me absolutely stumped. Im scraping data from a website and the data changes every week. I want to run my scraping process each time the data changes starting back on 09-09-2015 and running to current.
I know how to do this easily running thru every number like 0909 then 0910 then 0911 but that is not what I need as that will be requesting way too many requests from the server that are pointless.
Here is the format of the URL
http://www.myexamplesite.com/?date=09092015
I know the simple:
for i in range(startDate, endDate):
url = 'http://www.myexamplesite.com/?date={}'.format(i)
driver.get(url)
But one thing i've never been able to figure out is manipulate pythons dateTime to accurately reflect the format the website uses.
i.e:
09092015
09162015
09232015
09302015
10072015
...
09272017
If all else fails I only need to do this once so it wouldnt take too long to just ignore the loop altogether and just manually enter the date I wish to scrape from and then just append all of my dataframes together. Im mainly curious on how to manipulate the datetime function in this sense for future projects that may require more data.
A good place to start are datetime, date and timedelta objects docs.
First, let's construct our starting date and ending date (today):
>>> from datetime import date, timedelta
>>> start = date(2015, 9, 9)
>>> end = date.today()
>>> start, end
(datetime.date(2015, 9, 9), datetime.date(2017, 9, 27))
Now let's define the unit of increment -- one day:
>>> day = timedelta(days=1)
>>> day
datetime.timedelta(1)
A nice thing about dates (date/datetime) and time deltas (timedelta) is they and can be added:
>>> start + day
datetime.date(2015, 9, 10)
We can also use format() to get that date in a human-readable form:
>>> "{date.day:02}{date.month:02}{date.year}".format(date=start+day)
'10092015'
So, when we put all this together:
from datetime import date, timedelta
start = date(2015, 9, 9)
end = date.today()
week = timedelta(days=7)
mydate = start
while mydate < end:
print("{date.day:02}{date.month:02}{date.year}".format(date=mydate))
mydate += week
we get a simple iteration over dates starting with 2015-09-09 and ending with today, incremented by 7 days (a week):
09092015
16092015
23092015
30092015
07102015
...
Take a look here
https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
You can see the table pictured here for formatting dates and times and the usage.
Of course, if the format of the dates changes in the future or you are parsing different strings, you will have to make code changes. There really is no way around that.
In pandas I have a variable defined as
start_date = date (2020,7,1)
How do I update it to be the next day? I have a dataframe and I am filtering on individual days but I want to iterate through a full time range. I suppose I could have a for loop like so
for x < 10:
start_date = date (2020,7,x)
x +=1
But is there another way? I couldn't find any other stack exchange questions for python dates.
Assuming date is the regular python one you can add a day as follows:
from datetime import date, timedelta
start_date = date(2020, 7, 1)
next_date = start_date + timedelta(days=1)
Interestingly, I have searched a lot of questions but I cannot find just a simple answer to this question. Or I do find an answer but it won't allow me the flexibility to alter the format of the dates I require.
If I have a specified start and end date like this:
start = '2015-08-01' #YYY-MM-DD
end = '2020-07-06'
Is there a simple way using datetime in python to create a list of dates between these dates that adhere to this format of YYY-MM-DD? And if so, how can I subsequently reverse this list so list[0] is equal to today?
Here's a way using list comprehensions, which is far faster than the loop examples, and doesn't require any external libraries.
from datetime import date, timedelta
start = '2015-08-01'
end = '2020-07-06'
start_date = date.fromisoformat(start)
end_date = date.fromisoformat(end)
date_range = [
# end_date - timedelta(days=i) # For date objects
(end_date - timedelta(days=i)).isoformat() # For ISO-8601 strings
for i
in range((end_date - start_date).days)
]
reverse_range = list(reversed(date_range))
print(date_range[0])
print(reverse_range[0])
Output
2020-07-06
2015-08-02
You can also use pandas
import pandas as pd
start = '2015-08-01' #YYY-MM-DD
end = '2020-07-06'
pd.date_range(start, end)
# to start from today
pd.date_range(pd.Timestamp.today(), end)
You can also create a range with your desired frequency
pd.date_range(start, end, freq='14d') # every 14 dayes
pd.date_range(start, end, freq='H') # hourly and etc
The datetime.timedelta() function will help here. Try this:
import datetime
dates = []
d = datetime.date(2015,8,1)
while d <= datetime.date(2020,7,6):
dates.append(datetime.datetime.strftime(d,'%Y-%m-%d'))
d += datetime.timedelta(days=1)
This will populate the list dates, which will look like this:
['2015-08-01', '2015-08-02', '2015-08-03', .... , '2020-07-04', '2020-07-05', '2020-07-06']
EDIT:
Just use dates.append(d) instead of dates.append(datetime.datetime.strftime(d,'%Y-%m-%d')) to get a list of datetime.date objects instead of strings.
Reversing a list is pretty straight-forward in Python:
dates = dates[::-1]
After the above, dates[0] will be '2020-07-06'.
something like this ?
import datetime
def date_range(start, end):
r = (end+datetime.timedelta(days=1)-start).days
return [start+datetime.timedelta(days=i) for i in range(r)]
start = datetime.date(2015,01,01)
end = datetime.date(2020,07,06)
dateList = date_range(start, end)
print '\n'.join([str(date) for date in dateList])
I have a basic dataframe that is read into pandas, with a few rows of existing data that don't matter much.
df = pd.read_csv('myfile.csv')
df['Date'] = pd.to_datetime(df['Date'])
I need to be able to come up with a method that will allow me to loop through between two dates and add these as new rows. These dates are on a cycle, 21 days out of 28 day cycle. So if the start date was 4/1/13 and my end date was 6/1/19, I want to be able to add a row for each date, 21 days on and off for a week.
Desired output:
A, Date
x, 4/1/13
x, 4/2/13
x, 4/3/13
x, 4/4/13
x, 4/5/13
... cont'd
x, 4/21/13
y, 4/29/13
y, 4/30/13
... cont'd
You can see that between x and y there was a new cycle.
I think I am supposed to use Datetime for this but please correct me if I am wrong. I am not sure where to start.
EDIT
I started with this:
import datetime
# The size of each step in days
day_delta = datetime.timedelta(days=1)
start_date = datetime.date(2013, 4, 1)
end_date = start_date + 21*day_delta
for i in range((end_date - start_date).days):
print(start_date + i*day_delta)
And got this:
2013-04-01
2013-04-02
2013-04-03
2013-04-04
2013-04-05
2013-04-06
2013-04-07
2013-04-08
2013-04-09
2013-04-10
2013-04-11
2013-04-12
2013-04-13
2013-04-14
2013-04-15
2013-04-16
2013-04-17
2013-04-18
2013-04-19
2013-04-20
2013-04-21
But I am not sure how to implement the cycle in here.
TYIA!
Interesting question, I spent almost half an hour on this.
Yes, you will need the datetime module for this.
base = datetime.datetime.today()
date_list = [base - datetime.timedelta(days=x) for x in range(100)]
I made a list of dates as you did. This is a list of datetime.timedelta objects. I recommend you convert all your dates into this format to make calculations easier. We set a base date (the first day) to compare with the rest later on in a loop.
date_list_filtered = []
for each in enumerate(date_list):
date_list_filtered.append(each[1].strftime('%d/%m/%y'))
strftime() changes the datetime.datetime object into a readable date, my own preference is using the dd/mm/yy format. You can look up different formats online.
df = pd.DataFrame({'Raw':date_list,'Date':date_list_filtered})
Here I made a loop to count the difference in days between each date in the loop and the base date, changing the base date every time it hits -21.
Edit: Oops I did 21 days instead of 28, but I'm sure you can tweak it
base = df['Raw'][0]
unique_list = []
no21 = 0
for date in df['Raw'].values:
try:
res = (date-base).days
except:
res = (date-base).astype('timedelta64[D]')/np.timedelta64(1, 'D')
if res==-21.0:
base = date
#print(res)
unique_list.append(string.ascii_letters[no21])
no21+=1
else:
unique_list.append(string.ascii_letters[no21])
I used the string library to get the unique letters I wanted.
Lastly, put it in the data frame.
df['Unique'] = unique_list
Thanks for asking this question, it was really fun.
You can floor divide the difference in days from the start date by 28 to get the number of cycles.
date_start = datetime.datetime(2013, 4, 1)
date1 = datetime.datetime(2013, 5, 26)
And to check the difference
diff_days = (date1-date_start).days
diff_days
55
cycle = (date1-date_start).days//28
cycle
1
Then you can sum over the dates within the same cycle.
I would like to be able to pass in text from a user or file to filter pandas, which seems like query is the best way to handle it. However, I have a datetime index and can't seem to figure out a way to use timedeltas. I know I can filter dates with > or < like
query_string = 'index < datetime.datetime(2020, 2, 20, 11, 8, 19, 615268)'
df.query(queryString)
and
date = datetime.datetime.now()
query_string = 'index < #date'
df.query(queryString)
What I want to do is get a relative date range like getting the last 10 seconds of entries
date = datetime.now()
query_string = 'index > #date - datetime.timedelta(seconds=10)'
df.query(query_string)
This fails, and I can't seem to find a way to do something like filter anything relative to a timestamp. Is there any other way to format it so I can add/subtract a time from a date using df.query()?
Query does not support timedelta (checked in version Pandas 1.0.5). You can work around it using:
df[df['index'] >= datetime.now() - datetime.timedelta(10, 'S')]