Why is datetime being truncated in Pandas? - python

I would like to filter pandas using the time stamp. This works fine for all hours except 0. If I filter for dt.hour = 0, only the date is displayed and not the time. How can I have the time displayed too?
import datetime
df = pd.DataFrame({'datetime': [datetime.datetime(2005, 7, 14, 12, 30),
datetime.datetime(2005, 7, 14, 0, 0),
datetime.datetime(2005, 7, 14, 10, 30),
datetime.datetime(2005, 7, 14, 15, 30)]})
print(df[df['datetime'].dt.hour == 10])
print(df[df['datetime'].dt.hour == 0]

use strftime:
print(df[df['datetime'].dt.hour == 0].datetime.dt.strftime("%Y-%m-%d %H:%M:%S"))
The result is:
1 2005-07-14 00:00:00
Name: datetime, dtype: object

Related

dateutil rruleset: How to combine EXRULE and RDATE correctly?

I have a rruleset with a daily recurrence rule and now I am trying to combine an RDATE with an EXRULE.
from dateutil.rrule import rruleset, rrule, DAILY, FR
rules = rruleset()
daily = rrule(freq=DAILY, dtstart=datetime(2022, 10, 12))
rules.rrule(daily)
not_on_friday = rrule(freq=DAILY, byweekday=FR, dtstart=datetime(2022, 10, 12))
but_on_friday_21th = datetime(2022, 10, 21)
rules.exrule(not_on_friday)
rules.rdate(but_on_friday_21th)
rules.between(datetime(2022,10,12), datetime(2022,10,24))
>>
[datetime.datetime(2022, 10, 13, 0, 0),
datetime.datetime(2022, 10, 15, 0, 0), # the 14th is excluded as expected
datetime.datetime(2022, 10, 16, 0, 0),
datetime.datetime(2022, 10, 17, 0, 0),
datetime.datetime(2022, 10, 18, 0, 0),
datetime.datetime(2022, 10, 19, 0, 0),
datetime.datetime(2022, 10, 20, 0, 0),
datetime.datetime(2022, 10, 22, 0, 0), # but the 21th is also excluded
datetime.datetime(2022, 10, 23, 0, 0)]
Now, confusingly, when I combine my EXRULE with an EXDATE it works:
rules = rruleset()
daily = rrule(freq=DAILY, dtstart=datetime(2022, 10, 12))
rules.rrule(daily)
not_on_friday = rrule(freq=DAILY, byweekday=FR, dtstart=datetime(2022, 10, 12))
but_also_not_on_the_22th_a_saturday = datetime(2022, 10, 22)
rules.exrule(not_on_friday)
rules.exdate(but_also_not_on_the_22th_a_saturday)
rules.between(datetime(2022,10,12), datetime(2022,10,24))
>>
[datetime.datetime(2022, 10, 13, 0, 0),
datetime.datetime(2022, 10, 15, 0, 0), # the 14th still excluded
datetime.datetime(2022, 10, 16, 0, 0),
datetime.datetime(2022, 10, 17, 0, 0),
datetime.datetime(2022, 10, 18, 0, 0),
datetime.datetime(2022, 10, 19, 0, 0),
datetime.datetime(2022, 10, 20, 0, 0), # the 22th also excluded as expected
datetime.datetime(2022, 10, 23, 0, 0)]
So, if possible at all, how to combine RDATE and EXRULE in my rruleset?
In your answer you note that exrule is applied last, after all other inclusive rules which actually does appear to be in the RFC. However, at least in dateutil, you can use an rruleset as the argument to exrule, so to accomplish what you want, you can try filtering out the date that you want included from the rule that gets passed to exrule, like so:
from datetime import datetime
from dateutil.rrule import rruleset, rrule, DAILY, WEEKLY, FR
# Create an rruleset that defaults to every day
rules = rruleset()
daily = rrule(freq=DAILY, dtstart=datetime(2022, 10, 12))
rules.rrule(daily)
# Create an rruleset corresponding to the days we want to *exclude*: every
# Friday, except 2022-10-21
ex_set = rruleset()
ex_set.rrule(rrule(freq=WEEKLY, byweekday=FR, dtstart=datetime(2022, 10, 14)))
ex_set.exdate(datetime(2022, 10, 21))
# Use our second rule set as an exrule
rules.exrule(ex_set)
rules.between(datetime(2022,10,12), datetime(2022,10,24))
Since the date you want to include never appears in the exrule, it is not filtered out:
>>> print("\n".join(map(str,
... map(datetime.date,
... rules.between(datetime(2022, 10, 12),
... datetime(2022, 10, 24))))))
2022-10-13
2022-10-15
2022-10-16
2022-10-17
2022-10-18
2022-10-19
2022-10-20
2022-10-21
2022-10-22
2022-10-23
So apparently there is no such thing as an EXRULE in the iCalendar specs. Its just RRULEs. And dateutils exdate function states in the doc string:
def exrule(self, exrule):
""" Include the given rrule instance in the recurrence set exclusion
list. Dates which are part of the given recurrence rules will not
be generated, even if some inclusive rrule or rdate matches them.
"""
So, even if I add an RDATE, if it is exclude by a rule added by exrule it will not show up in my occurrences. Same goes for the exdate function, hence my working second example.

Add '0' before days and months using to_pydatetime()

I have data stored in a S3 bucket which uses "yyyy/MM/dd" format to store the files per date, like in this sample S3a path: s3a://mybucket/data/2018/07/03. The files in these buckets are in json.gz format and I would like to import all these files to a spark dataframe per day. After that I want to feed these spark dfs to some written code via a for loop:
for date in date_range:
s3a = 's3a://mybucket/data/{}/{}/{}/*.json.gz'.format(date.year, date.month, date.day)
df = spark.read.format('json').option("header", "true").load(s3a)
# Execute code here
In order to read the data, I tried to format the date_range like below:
from datetime import datetime
import pandas as pd
def return_date_range(start_date, end_date):
return pd.date_range(start=start_date, end=end_date).to_pydatetime().tolist()
date_range = return_date_range(start_date='2018-03-06', end_date='2018-03-12')
date_range
[datetime.datetime(2018, 3, 6, 0, 0),
datetime.datetime(2018, 3, 7, 0, 0),
datetime.datetime(2018, 3, 8, 0, 0),
datetime.datetime(2018, 3, 9, 0, 0),
datetime.datetime(2018, 3, 10, 0, 0),
datetime.datetime(2018, 3, 11, 0, 0),
datetime.datetime(2018, 3, 12, 0, 0)]
The problem is that pydatetime() returns the days and months without a '0'. How do I make sure that my code returns a list of values with '0's, like below:
[datetime.datetime(2018, 03, 06, 0, 0),
datetime.datetime(2018, 03, 07, 0, 0),
datetime.datetime(2018, 03, 08, 0, 0),
datetime.datetime(2018, 03, 09, 0, 0),
datetime.datetime(2018, 03, 10, 0, 0),
datetime.datetime(2018, 03, 11, 0, 0),
datetime.datetime(2018, 03, 12, 0, 0)]
This is one approach using .strftime("%Y/%m/%d")
Ex:
from datetime import datetime
import pandas as pd
def return_date_range(start_date, end_date):
return pd.date_range(start=start_date, end=end_date).strftime("%Y/%m/%d").tolist()
date_range = return_date_range(start_date='2018-03-06', end_date='2018-03-12')
print(date_range)
Output:
['2018/03/06',
'2018/03/07',
'2018/03/08',
'2018/03/09',
'2018/03/10',
'2018/03/11',
'2018/03/12']
for date in date_range:
s3a = 's3a://mybucket/data/{}/*.json.gz'.format(date)
print(s3a)
s3a://mybucket/data/2018/03/06/*.json.gz
s3a://mybucket/data/2018/03/07/*.json.gz
s3a://mybucket/data/2018/03/08/*.json.gz
s3a://mybucket/data/2018/03/09/*.json.gz
s3a://mybucket/data/2018/03/10/*.json.gz
s3a://mybucket/data/2018/03/11/*.json.gz
s3a://mybucket/data/2018/03/12/*.json.gz

converting an array of unixtime to mmddyy on python

I have an array of unixtime timestamps. How do I convert that using
datetime.utcfromtimestamp().strftime("%Y-%M-%D %H:%M:%S")
? My array is saved under "time". How do I utilize that array in this conversion?
Assuming your times are of the format datetime, you can loop through the list and convert each one.
Here is a quick example:
import datetime
time = []
for i in range(10):
time.append(datetime.datetime.now())
print(time) # output: [datetime.datetime(2020, 7, 8, 10, 7, 4, 314614), datetime.datetime(2020, 7, 8, 10, 7, 4, 314622)....
formattedTime = []
for t in time:
formattedTime.append(t.strftime('%Y-%m-%d %H:%M:%S'))
print(formattedTime) # output: ['2020-07-07/08/20 10:07:04', '2020-07-07/08/20 10:07:04', ....
# the update to my answer:
newTimes = []
for date_time_str in formattedTime:
newTimes.append(datetime.datetime.strptime(date_time_str, '%Y-%m-%d %H:%M:%S'))
print(newTimes) # '%Y-%m-%d %H:%M:%S' [datetime.datetime(2020, 7, 8, 18, 56, 47), datetime.datetime(2020, 7, 8, 18, 56, 47), datetime.datetime(2020, 7, 8, 18, 56, 47),...]
Let me know if you have more questions.
I am also attaching this article for datetime which I found really helpful.
Here is the example in repl

Generate random list of timestamps within multiple time intervals in python

Is there any efficient way to generate a list of N random timeframes which do not intersect each other given the initial lower and upper bounds as well as the time intervals that these time periods should have. For example in the following case I want 10 timestamps between 09:00-17:00:
Initial start time: {datetime} YYYY-MM-DD 09:00:00
Initial end time: {datetime} YYYY-MM-DD 17:00:00
Timestamp intervals (in minutes): [32 24 4 20 40 8 27 18 3 4]
where the first time period 32 minutes long, the next 24 and so on.
The way I am doing it at the moment is by using more or less the following code snippet:
def random_time(start, end, timeframe=None):
sec_diff = int((end - start).total_seconds())
secs_to_add = random.randint(0, sec_diff)
return start + timedelta(seconds=secs_to_add)
def in_datetimes_range(self, x, starts, ends):
return np.any((starts <= x) & (x <= ends))
n = 10
dadate = datetime.now()
year = self.dadate.year
month = self.dadate.month
day = self.dadate.day
start = datetime(year, month, day, 9, 0, 0)
end = datetime(year, month, day, 17, 0, 0)
timeframe = [32 24 4 20 40 8 27 18 3 4]
startTimes = []
endTimes = []
for i in range(0, n):
while True:
startTime = random_time(start, end)
endTime = startTime + timedelta(minutes=int(timeframe[i]))
if startTimes:
startTimesAsNpArray = np.array(startTimes)
endTimesAsNpArray = np.array(endTimes)
#check if new time period falls inside existing timeframes or if existing timeframes fall within new time period
inner_bound = np.logical_or(in_datetimes_range(startTime, startTimesAsNpArray, endTimesAsNpArray), in_datetimes_range(endTime, startTimesAsNpArray, endTimesAsNpArray))
outer_bound = np.logical_or(in_datetimes_range(startTimesAsNpArray, startTime, endTime), in_datetimes_range(endTimesAsNpArray, startTime, endTime))
if not inner_bound and not outer_bound:
startTimes.append(startTime)
endTimes.append(endTime)
break
but this is really inefficient and I was looking for something more reliable if possible.
Here is a way to do it: the idea is that if we remove the total duration of the periods from the time available, generate start times in the period that is left, and then postpone them with the cumulated periods before them, we are sure that the intervals won't overlap.
from datetime import datetime, timedelta
import random
def generate_periods(start, end, durations):
durations = [timedelta(minutes=m) for m in durations]
total_duration = sum(durations, timedelta())
nb_periods = len(durations)
open_duration = (end - start) - total_duration
delays = sorted(timedelta(seconds=s)
for s in random.sample(range(0, int(open_duration.total_seconds())), nb_periods))
periods = []
periods_before = timedelta()
for delay, duration in zip(delays, durations):
periods.append((start + delay + periods_before,
start + delay + periods_before + duration))
periods_before += duration
return periods
Sample run:
durations = [32, 24, 4, 20, 40, 8, 27, 18, 3, 4]
start_time = datetime(2019, 9, 2, 9, 0, 0)
end_time = datetime(2019, 9, 2, 17, 0, 0)
generate_periods(start_time, end_time, durations)
# [(datetime.datetime(2019, 9, 2, 9, 16, 1),
# datetime.datetime(2019, 9, 2, 9, 48, 1)),
# (datetime.datetime(2019, 9, 2, 9, 58, 57),
# datetime.datetime(2019, 9, 2, 10, 22, 57)),
# (datetime.datetime(2019, 9, 2, 10, 56, 41),
# datetime.datetime(2019, 9, 2, 11, 0, 41)),
# (datetime.datetime(2019, 9, 2, 11, 2, 37),
# datetime.datetime(2019, 9, 2, 11, 22, 37)),
# (datetime.datetime(2019, 9, 2, 11, 48, 17),
# datetime.datetime(2019, 9, 2, 12, 28, 17)),
# (datetime.datetime(2019, 9, 2, 13, 4, 28),
# datetime.datetime(2019, 9, 2, 13, 12, 28)),
# (datetime.datetime(2019, 9, 2, 15, 13, 3),
# datetime.datetime(2019, 9, 2, 15, 40, 3)),
# (datetime.datetime(2019, 9, 2, 16, 6, 44),
# datetime.datetime(2019, 9, 2, 16, 24, 44)),
# (datetime.datetime(2019, 9, 2, 16, 37, 42),
# datetime.datetime(2019, 9, 2, 16, 40, 42)),
# (datetime.datetime(2019, 9, 2, 16, 42, 50),
# datetime.datetime(2019, 9, 2, 16, 46, 50))]
Like this?
import pandas as pd
from datetime import datetime
date = datetime.now()
start = datetime(date.year, date.month, date.day, 9, 0, 0)
end = datetime(date.year, date.month, date.day, 17, 0, 0)
interval = 32
periods = (end-start).seconds/60/interval
times = pd.date_range(start.strftime("%m/%d/%Y, %H:%M:%S"), periods=periods, freq=str(interval)+'min')
or like this
# =============================================================================
# or if you want the results as a dataframe
# =============================================================================
def xyz(interval):
date = datetime.now()
start = datetime(date.year, date.month, date.day, 9, 0, 0)
end = datetime(date.year, date.month, date.day, 17, 0, 0)
periods = (end-start).seconds/60/interval
return pd.date_range(start.strftime("%m/%d/%Y, %H:%M:%S"), periods=periods, freq=str(interval)+'min')
timeframes = [32,24,4,20,40,8,27,18,3,4]
df_output=pd.DataFrame(index=timeframes, data=[xyz(x) for x in timeframes])

Datetime object from a string for 15 days frequency

I am trying to code a function called days15(). The function will be passed an argument called ‘myDateStr’. myDateStr is string representation of a date in the form 20170817 (that is YearMonthDay). The code in the function will create a datetime object from the string, it will then create a timedelta object with a length of 1 day. Then, it will use a list comprehension to produce a list of 15 datetime objects, starting with the date that is passed to the function
the function should return the following list.
[datetime.datetime(2017, 8, 17, 0, 0), datetime.datetime(2017, 8, 18, 0, 0), datetime.datetime(2017, 8, 19, 0, 0), datetime.datetime(2017, 8, 20, 0, 0), datetime.datetime(2017, 8, 21, 0, 0), datetime.datetime(2017, 8, 22, 0, 0), datetime.datetime(2017, 8, 23, 0, 0), datetime.datetime(2017, 8, 24, 0, 0), datetime.datetime(2017, 8, 25, 0, 0), datetime.datetime(2017, 8, 26, 0, 0), datetime.datetime(2017, 8, 27, 0, 0), datetime.datetime(2017, 8, 28, 0, 0), datetime.datetime(2017, 8, 29, 0, 0), datetime.datetime(2017, 8, 30, 0, 0), datetime.datetime(2017, 8, 31, 0, 0)]
I am stuck for the code. I have strted with the below.Please help. Thanks
from datetime import datetime, timedelta
myDateStr = '20170817'
def days15(myDateStr):
Pandas will help you in converting strings to datetime, so first you need to import it:
from datetime import datetime, timedelta
import pandas as pd
myDateStr = '20170817'
Then you can initialize an empty list that you'll later append:
datelist = []
And then you write a function:
def days15(myDateStr):
#converting to datetime
date = pd.to_datetime(myDateStr)
#loop to create 15 datetimes
for i in range(15):
newdate = date + timedelta(days=i)
#adding new dates to the list
datelist.append(newdate)
and then you can call your function and get a list of 15 datetimes:
days15(myDateStr)
As you said, there will be two steps to implement: firstly, convert the string date to a datetime object and secondly, iterate over the next 15 days using timedelta, with a list comprehension or a simple loop.
from datetime import datetime, timedelta
myDateStr = '20170817'
# Parse the string and return a datetime object
def getDateTime(date):
return datetime(int(date[:4]),int(date[4:6]),int(date[6:]))
# Iterate over the timedelta added to the starting date
def days15(myDateStr):
return [getDateTime(myDateStr) + timedelta(days=x) for x in range(15)]

Categories

Resources