I am trying to code a function called days15(). The function will be passed an argument called ‘myDateStr’. myDateStr is string representation of a date in the form 20170817 (that is YearMonthDay). The code in the function will create a datetime object from the string, it will then create a timedelta object with a length of 1 day. Then, it will use a list comprehension to produce a list of 15 datetime objects, starting with the date that is passed to the function
the function should return the following list.
[datetime.datetime(2017, 8, 17, 0, 0), datetime.datetime(2017, 8, 18, 0, 0), datetime.datetime(2017, 8, 19, 0, 0), datetime.datetime(2017, 8, 20, 0, 0), datetime.datetime(2017, 8, 21, 0, 0), datetime.datetime(2017, 8, 22, 0, 0), datetime.datetime(2017, 8, 23, 0, 0), datetime.datetime(2017, 8, 24, 0, 0), datetime.datetime(2017, 8, 25, 0, 0), datetime.datetime(2017, 8, 26, 0, 0), datetime.datetime(2017, 8, 27, 0, 0), datetime.datetime(2017, 8, 28, 0, 0), datetime.datetime(2017, 8, 29, 0, 0), datetime.datetime(2017, 8, 30, 0, 0), datetime.datetime(2017, 8, 31, 0, 0)]
I am stuck for the code. I have strted with the below.Please help. Thanks
from datetime import datetime, timedelta
myDateStr = '20170817'
def days15(myDateStr):
Pandas will help you in converting strings to datetime, so first you need to import it:
from datetime import datetime, timedelta
import pandas as pd
myDateStr = '20170817'
Then you can initialize an empty list that you'll later append:
datelist = []
And then you write a function:
def days15(myDateStr):
#converting to datetime
date = pd.to_datetime(myDateStr)
#loop to create 15 datetimes
for i in range(15):
newdate = date + timedelta(days=i)
#adding new dates to the list
datelist.append(newdate)
and then you can call your function and get a list of 15 datetimes:
days15(myDateStr)
As you said, there will be two steps to implement: firstly, convert the string date to a datetime object and secondly, iterate over the next 15 days using timedelta, with a list comprehension or a simple loop.
from datetime import datetime, timedelta
myDateStr = '20170817'
# Parse the string and return a datetime object
def getDateTime(date):
return datetime(int(date[:4]),int(date[4:6]),int(date[6:]))
# Iterate over the timedelta added to the starting date
def days15(myDateStr):
return [getDateTime(myDateStr) + timedelta(days=x) for x in range(15)]
Related
I have data stored in a S3 bucket which uses "yyyy/MM/dd" format to store the files per date, like in this sample S3a path: s3a://mybucket/data/2018/07/03. The files in these buckets are in json.gz format and I would like to import all these files to a spark dataframe per day. After that I want to feed these spark dfs to some written code via a for loop:
for date in date_range:
s3a = 's3a://mybucket/data/{}/{}/{}/*.json.gz'.format(date.year, date.month, date.day)
df = spark.read.format('json').option("header", "true").load(s3a)
# Execute code here
In order to read the data, I tried to format the date_range like below:
from datetime import datetime
import pandas as pd
def return_date_range(start_date, end_date):
return pd.date_range(start=start_date, end=end_date).to_pydatetime().tolist()
date_range = return_date_range(start_date='2018-03-06', end_date='2018-03-12')
date_range
[datetime.datetime(2018, 3, 6, 0, 0),
datetime.datetime(2018, 3, 7, 0, 0),
datetime.datetime(2018, 3, 8, 0, 0),
datetime.datetime(2018, 3, 9, 0, 0),
datetime.datetime(2018, 3, 10, 0, 0),
datetime.datetime(2018, 3, 11, 0, 0),
datetime.datetime(2018, 3, 12, 0, 0)]
The problem is that pydatetime() returns the days and months without a '0'. How do I make sure that my code returns a list of values with '0's, like below:
[datetime.datetime(2018, 03, 06, 0, 0),
datetime.datetime(2018, 03, 07, 0, 0),
datetime.datetime(2018, 03, 08, 0, 0),
datetime.datetime(2018, 03, 09, 0, 0),
datetime.datetime(2018, 03, 10, 0, 0),
datetime.datetime(2018, 03, 11, 0, 0),
datetime.datetime(2018, 03, 12, 0, 0)]
This is one approach using .strftime("%Y/%m/%d")
Ex:
from datetime import datetime
import pandas as pd
def return_date_range(start_date, end_date):
return pd.date_range(start=start_date, end=end_date).strftime("%Y/%m/%d").tolist()
date_range = return_date_range(start_date='2018-03-06', end_date='2018-03-12')
print(date_range)
Output:
['2018/03/06',
'2018/03/07',
'2018/03/08',
'2018/03/09',
'2018/03/10',
'2018/03/11',
'2018/03/12']
for date in date_range:
s3a = 's3a://mybucket/data/{}/*.json.gz'.format(date)
print(s3a)
s3a://mybucket/data/2018/03/06/*.json.gz
s3a://mybucket/data/2018/03/07/*.json.gz
s3a://mybucket/data/2018/03/08/*.json.gz
s3a://mybucket/data/2018/03/09/*.json.gz
s3a://mybucket/data/2018/03/10/*.json.gz
s3a://mybucket/data/2018/03/11/*.json.gz
s3a://mybucket/data/2018/03/12/*.json.gz
I would like to filter pandas using the time stamp. This works fine for all hours except 0. If I filter for dt.hour = 0, only the date is displayed and not the time. How can I have the time displayed too?
import datetime
df = pd.DataFrame({'datetime': [datetime.datetime(2005, 7, 14, 12, 30),
datetime.datetime(2005, 7, 14, 0, 0),
datetime.datetime(2005, 7, 14, 10, 30),
datetime.datetime(2005, 7, 14, 15, 30)]})
print(df[df['datetime'].dt.hour == 10])
print(df[df['datetime'].dt.hour == 0]
use strftime:
print(df[df['datetime'].dt.hour == 0].datetime.dt.strftime("%Y-%m-%d %H:%M:%S"))
The result is:
1 2005-07-14 00:00:00
Name: datetime, dtype: object
I have a list of datetimes objects :
time_range = [datetime.datetime(2019, 7, 9, 0, 0, tzinfo=tzutc()),
datetime.datetime(2019, 7, 8, 0, 0, tzinfo=tzutc()),
datetime.datetime(2019, 7, 7, 0, 0, tzinfo=tzutc()),
datetime.datetime(2019, 7, 6, 0, 0, tzinfo=tzutc())
... ]
And I have an other datetime object :
time = datetime(2019, 7, 7)
I have to test if time is in time_range.
But each time I test :
time in time_range
I get the output False, because I don't have the tzinfo.
Here's what I've tried :
I tried to add the tzinfo :
time = datetime(2019, 7, 7, tzinfo=tzuct())
but I can't find where the tzutc() function is.
I also tried tu use pandas :
import pandas as pd
pd.to_datetime(str(time) + '+00:00')
I get the UTC :
Timestamp('2019-07-05 00:00:00+0000', tz='UTC')
But this is not a datetime.datetime object...
Do you have an idea how I could do ?
(Note : i'm compelled to use the form time in time_range, because of the rest of my program)
In datetime constructor, tzinfo parameter expects a type of timezone. It's not the clearest documentation. Try this:
from datetime import datetime, timezone
dt = datetime(2019, 7, 7, tzinfo=timezone.utc)
>>> from datetime import datetime, timezone
>>> time = datetime(2019, 7, 7, tzinfo=timezone.utc)
>>> print(time)
2019-07-07 00:00:00+00:00
>>> print(time.tzinfo)
UTC
After some research, I found an other solution, using pandas :
utc_time = pd.to_datetime(str(time) + '+00:00').to_pydatetime()
returns a datetime.datetime object :
datetime.datetime(2019, 7, 7, 0, 0, tzinfo=<UTC>)
However, to avoid importing pandas library, here's the solution I used :
from datetime import datetime, timezone
new_time = time.replace(tzinfo=timezone.utc)
new_time in time_range.
>>> True
When creating a pandas dataframe object (python 2.7.9, pandas 0.16.2), the first datetime field gets automatically converted into a pandas timestamp. Why? Is it possible to prevent this so as to keep the field in the original type?
Please see code below:
import numpy as np
import datetime
import pandas
create a dict:
x = {'cusip': np.array(['10553M10', '67085120', '67085140'], dtype='|S8'),
'vstart':np.array([datetime.datetime(2001, 11, 16, 0, 0),
datetime.datetime(2012, 2, 28, 0, 0), datetime.datetime(2014, 12, 22, 0, 0)],
dtype=object),
'vstop': np.array([datetime.datetime(2012, 2, 28, 0, 0),
datetime.datetime(2014, 12, 22, 0, 0), datetime.datetime(9999, 12, 31, 0, 0)],
dtype=object),
'id': np.array(['EQ0000000000041095', 'EQ0000000000041095', 'EQ0000000000041095'],
dtype='|S18')}
So, the vstart and vstop keys are datetime so far. However, after:
df = pandas.DataFrame(data = x)
the vstart becomes a pandas Timestamp automatically while vstop remains a datetime
type(df.vstart[0])
#class 'pandas.tslib.Timestamp'
type(df.vstop[0])
#type 'datetime.datetime'
I don't understand why the first datetime column that the constructor comes across gets converted to Timestamp by pandas. And how to tell pandas to keep the data types as they are. Can you help? Thank you.
actually I've noticed something in your data , it has nothing to do with your first or second date column in your column vstop there is a datetime with value dt.datetime(9999, 12, 31, 0, 0) , if you changed the year on this date to a normal year like 2020 for example both columns will be treated the same .
just note that I'm importing datetime module as dt
x = {'cusip': np.array(['10553M10', '67085120', '67085140'], dtype='|S8'),
'vstop': np.array([dt.datetime(2012, 2, 28, 0, 0), dt.datetime(2014, 12, 22, 0, 0), dt.datetime(2020, 12, 31, 0, 0)], dtype=object),
'vstart': np.array([dt.datetime(2001, 11, 16, 0, 0),dt.datetime(2012, 2, 28, 0, 0), dt.datetime(2014, 12, 22, 0, 0)], dtype=object),
'id': np.array(['EQ0000000000041095', 'EQ0000000000041095', 'EQ0000000000041095'], dtype='|S18')}
In [27]:
df = pd.DataFrame(x)
df
Out[27]:
cusip id vstart vstop
10553M10 EQ0000000000041095 2001-11-16 2012-02-28
67085120 EQ0000000000041095 2012-02-28 2014-12-22
67085140 EQ0000000000041095 2014-12-22 2020-12-31
In [25]:
type(df.vstart[0])
Out[25]:
pandas.tslib.Timestamp
In [26]:
type(df.vstop[0])
Out[26]:
pandas.tslib.Timestamp
I want to write a function that returns a tuple of (start,end) where start is the Monday at 00:00:00:000000 and end is Sunday at 23:59:59:999999. start and end are datetime objects. No other information is given about day, month or year. i tried this function
def week_start_end(date):
start= date.strptime("00:00:00.000000", "%H:%M:%S.%f")
end = date.strptime("23:59:59.999999", "%H:%M:%S.%f")
return (start,end)
print week_start_end(datetime(2013, 8, 15, 12, 0, 0))
should return (datetime(2013, 8, 11, 0, 0, 0, 0), datetime(2013, 8, 17, 23, 59, 59, 999999))
but the function returns tuple with dates (datetime.datetime(1900, 1, 1, 0, 0), datetime.datetime(1900, 1, 1, 23, 59, 59, 999999))
I think using datetime.isocalendar is a nice solution. This give the correct outputs for your example:
import datetime
def iso_year_start(iso_year):
"The gregorian calendar date of the first day of the given ISO year"
fourth_jan = datetime.date(iso_year, 1, 4)
delta = datetime.timedelta(fourth_jan.isoweekday()-1)
return fourth_jan - delta
def iso_to_gregorian(iso_year, iso_week, iso_day):
"Gregorian calendar date for the given ISO year, week and day"
year_start = iso_year_start(iso_year)
return year_start + datetime.timedelta(days=iso_day-1, weeks=iso_week-1)
def week_start_end(date):
year = date.isocalendar()[0]
week = date.isocalendar()[1]
d1 = iso_to_gregorian(year, week, 0)
d2 = iso_to_gregorian(year, week, 6)
d3 = datetime.datetime(d1.year, d1.month, d1.day, 0,0,0,0)
d4 = datetime.datetime(d2.year, d2.month, d2.day, 23,59,59,999999)
return (d3,d4)
As an example:
>>> d = datetime.datetime(2013, 8, 15, 12, 0, 0)
>>> print week_start_end(d)
(datetime.datetime(2013, 8, 11, 0, 0), datetime.datetime(2013, 8, 17, 23, 59, 59, 999999))
And should help you with your problem.