Comparing dates in list with date ranges in dataframe - python

I'm having difficulty figuring out a way to count the occurrences of holidays between datetime ranges in a dataframe. The holidays are in a list while the datetime ranges are in the dataframe as shown below: (note that this is a subset of a very large data set)
df = pd.DataFrame({'Date': ['2018-12-19 18:47','2019-01-01 06:11','2019-01-12 10:05','2019-02-17 14:22','2019-03-08 16:17','2019-03-25 17:35','2019-02-14 17:35'],
'End Date': ['2018-12-28 18:47','2019-01-05 06:11','2019-01-16 10:05','2019-02-19 14:22','2019-03-12 16:17','2019-03-26 17:35','2019-05-27 17:35']})
df['Date'] = pd.to_datetime(df['Date'])
df['End Date'] = pd.to_datetime(df['End Date'])
Holidays = [date(2018,12,24),date(2018,12,25),date(2019,1,1),date(2019,1,21),date(2019,2,18),date(2019,3,8),date(2019,5,27)]
I've been able to find a way that determine whether or not a Holiday is within the datetime ranges, but not get an actual count.
Is there a way to alter the code below to gather the count rather than boolean values?
This is what I've tried so far:
df['Holidays'] = [any([(z>=x)&(z<=y) for z in Holidays]) for x , y in zip(df['Date'].dt.date,df['End Date'].dt.date)]
The result I'm looking for is as follows:
result = pd.DataFrame({'Date': ['2018-12-19 18:47','2019-01-01 06:11','2019-01-12 10:05','2019-02-17 14:22','2019-03-08 16:17','2019-03-25 17:35','2019-02-14 17:35'],
'End Date': ['2018-12-28 18:47','2019-01-05 06:11','2019-01-16 10:05','2019-02-19 14:22','2019-03-12 16:17','2019-03-26 17:35','2019-05-27 17:35'],
'Holidays': [2,1,0,1,1,0,3]})

We can make a function that checks this condition and then apply it row-wise.
def fn(series):
return sum([series.iloc[0] <= h <= series.iloc[1] for h in Holidays])
df.assign(Holidays=df.apply(fn, axis=1))
Date End Date Holidays
0 2018-12-19 18:47:00 2018-12-28 18:47:00 2
1 2019-01-01 06:11:00 2019-01-05 06:11:00 0
2 2019-01-12 10:05:00 2019-01-16 10:05:00 0
3 2019-02-17 14:22:00 2019-02-19 14:22:00 1
4 2019-03-08 16:17:00 2019-03-12 16:17:00 0
5 2019-03-25 17:35:00 2019-03-26 17:35:00 0
6 2019-02-14 17:35:00 2019-05-27 17:35:00 3
Your desired output is incorrect because the Holidays list has no hours for any of the date timestamps. To get the output that you posted we will have to round down to the day.
def fn(series):
return sum([series.iloc[0].floor('d') <= h <= series.iloc[1].floor('d') for h in Holidays])
df.assign(Holidays=df.apply(fn, axis=1))
Date End Date Holidays
0 2018-12-19 18:47 2018-12-28 18:47 2
1 2019-01-01 06:11 2019-01-05 06:11 1
2 2019-01-12 10:05 2019-01-16 10:05 0
3 2019-02-17 14:22 2019-02-19 14:22 1
4 2019-03-08 16:17 2019-03-12 16:17 1
5 2019-03-25 17:35 2019-03-26 17:35 0
6 2019-02-14 17:35 2019-05-27 17:35 3

Related

Pandas datetime - keep time only as dtype datetime

I want the time without the date in Pandas.
I want to keep the time as dtype datetime64[ns] and not as an object so that I can determine periods between times.
The closest I have gotten is as follows, but it gives back the date in a new column not the time as needed as dtype datetime.
df_pres_mf['time'] = pd.to_datetime(df_pres_mf['time'], format ='%H:%M', errors = 'coerce') # returns date (1900-01-01) and actual time as a dtype datetime64[ns] format
df_pres_mf['just_time'] = df_pres_mf['time'].dt.date
df_pres_mf['normalised_time'] = df_pres_mf['time'].dt.normalize()
df_pres_mf.head()
Returns the date as 1900-01-01 and not the time that is needed.
Edit: Data
time
1900-01-01 11:16:00
1900-01-01 15:20:00
1900-01-01 09:55:00
1900-01-01 12:01:00
You could do it like Vishnudev suggested but then you would have dtype: object (or even strings, after using dt.strftime), which you said you didn't want.
What you are looking for doesn't exist, but the closest thing that I can get you is converting to timedeltas. Which won't seem like a solution at first but is actually very useful.
Convert it like this:
# sample df
df
>>
time
0 2021-02-07 09:22:00
1 2021-05-10 19:45:00
2 2021-01-14 06:53:00
3 2021-05-27 13:42:00
4 2021-01-18 17:28:00
df["timed"] = df.time - df.time.dt.normalize()
df
>>
time timed
0 2021-02-07 09:22:00 0 days 09:22:00 # this is just the time difference
1 2021-05-10 19:45:00 0 days 19:45:00 # since midnight, which is essentially the
2 2021-01-14 06:53:00 0 days 06:53:00 # same thing as regular time, except
3 2021-05-27 13:42:00 0 days 13:42:00 # that you can go over 24 hours
4 2021-01-18 17:28:00 0 days 17:28:00
this allows you to calculate periods between times like this:
# subtract the last time from the current
df["difference"] = df.timed - df.timed.shift()
df
Out[48]:
time timed difference
0 2021-02-07 09:22:00 0 days 09:22:00 NaT
1 2021-05-10 19:45:00 0 days 19:45:00 0 days 10:23:00
2 2021-01-14 06:53:00 0 days 06:53:00 -1 days +11:08:00 # <-- this is because the last
3 2021-05-27 13:42:00 0 days 13:42:00 0 days 06:49:00 # time was later than the current
4 2021-01-18 17:28:00 0 days 17:28:00 0 days 03:46:00 # (see below)
to get rid of odd differences, make it absolute:
df["abs_difference"] = df.difference.abs()
df
>>
time timed difference abs_difference
0 2021-02-07 09:22:00 0 days 09:22:00 NaT NaT
1 2021-05-10 19:45:00 0 days 19:45:00 0 days 10:23:00 0 days 10:23:00
2 2021-01-14 06:53:00 0 days 06:53:00 -1 days +11:08:00 0 days 12:52:00 ### <<--
3 2021-05-27 13:42:00 0 days 13:42:00 0 days 06:49:00 0 days 06:49:00
4 2021-01-18 17:28:00 0 days 17:28:00 0 days 03:46:00 0 days 03:46:00
Use proper formatting according to your date format and convert to datetime
df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
Format according to the preferred format
df['time'].dt.strftime('%H:%M')
Output
0 11:16
1 15:20
2 09:55
3 12:01
Name: time, dtype: object

I would like to return the maximum value of a column

MY DATAI have 3 columns that contain wind speed and direction and a time index.
I wanted to return the daily maximum the wind with the time and the specific wind direction of that time.
I used the command below:
df['max_day']=df.wind.resample('1D').max()
but he always returned to me at 00:00
Here's a sample of the data:
time vento10m_azul dir
2019-01-01 1:00:00 7.4527917 84.17657707
2019-01-01 2:00:00 7.571505 82.76253884
2019-01-01 3:00:00 7.529691 78.80457605
2019-01-01 4:00:00 7.2273316 76.08609884
2019-01-01 5:00:00 6.985468 75.99220721
2019-01-02 0:00:00 5.5748515 76.23670838
2019-01-02 1:00:00 5.1289306 66.44264187
2019-01-02 2:00:00 4.63257 57.76554662
2019-01-02 3:00:00 4.036444 48.3211454
2019-01-02 4:00:00 3.26109 47.26135372
2019-01-02 5:00:00 2.6211443 53.60521783
A fuller one month sample is in this link:
https://drive.google.com/open?id=133E7xA3h5StVjlgVqqnfwFmRTFR2HcUE
Try doing this:
df['time'] = pd.to_datetime(df['time'])
df = df.iloc[df.groupby(pd.Grouper(key='time', freq='1D'))['vento10m_azul'].idxmax()]
df['time'] = df['time'].dt.date
df = df.reset_index().drop(columns=['index'])
print(df)
Output:
time vento10m_azul dir
0 2019-01-01 7.571505 82.762539
1 2019-01-02 6.582745 43.261218
2 2019-01-03 7.914436 26.962216
3 2019-01-04 8.309497 354.637982
4 2019-01-05 9.034869 143.472224
5 2019-01-06 6.909633 113.542660
6 2019-01-07 8.210649 23.854406
7 2019-01-08 8.628985 29.572357
8 2019-01-09 9.898343 64.477980
9 2019-01-10 10.570002 49.819634
10 2019-01-11 5.311725 27.333261
11 2019-01-12 4.922985 79.928011
12 2019-01-13 7.385470 63.877019
13 2019-01-14 8.799546 40.721517
14 2019-01-15 7.766147 51.942357
15 2019-01-16 8.430967 295.331752
16 2019-01-17 7.590732 4.340045
17 2019-01-18 5.254148 96.465752
18 2019-01-19 4.975754 13.093988
19 2019-01-20 8.721619 178.418132
20 2019-01-21 2.412958 78.999404
21 2019-01-22 7.567795 127.181465
22 2019-01-23 6.668825 106.142476
23 2019-01-24 7.524504 142.564668
24 2019-01-25 7.676533 52.388050
25 2019-01-26 7.374160 47.992977
26 2019-01-27 10.085866 45.983522
27 2019-01-28 8.340270 50.408780
28 2019-01-29 6.613598 61.931717
29 2019-01-30 6.229586 58.925196
30 2019-01-31 5.741903 47.251849
First, load the CSV file and convert the time field to a datetime format:
import pandas as pd
df = pd.read_csv("my_date.csv")
df["time"] = pd.to_datetime(df.time)
Next, calculate the maximum speed for each day by grouping the data by date, taking the max, and renaming columns appropriately:
max_speed = df.groupby(df.time.dt.date)["vento10m_azul"].max().reset_index().rename(columns={"time": "date", "vento10m_azul": "max_vento10m_azul"})
Finally, merge the dataframe containing maximum speed information with the original dataframe containing all the wind speed data. Keep only the rows with values equal to the maximum, and drop other unnecessary columns.
df["date"] = df.time.dt.date
df_x = df.merge(max_speed, on="date")
df_x = df_x[df_x["vento10m_azul"] == df_x["max_vento10m_azul"]]
df_x = df_x[["time", "vento10m_azul"]]

How can I parse a field in a DF into Month, Day, Year, Hour, and Weekday?

I have data that looks like this.
VendorID lpep_pickup_datetime lpep_dropoff_datetime store_and_fwd_flag
2 1/1/2018 0:18:50 1/1/2018 12:24:39 AM N
2 1/1/2018 0:30:26 1/1/2018 12:46:42 AM N
2 1/1/2018 0:07:25 1/1/2018 12:19:45 AM N
2 1/1/2018 0:32:40 1/1/2018 12:33:41 AM N
2 1/1/2018 0:32:40 1/1/2018 12:33:41 AM N
2 1/1/2018 0:38:35 1/1/2018 1:08:50 AM N
2 1/1/2018 0:18:41 1/1/2018 12:28:22 AM N
2 1/1/2018 0:38:02 1/1/2018 12:55:02 AM N
2 1/1/2018 0:05:02 1/1/2018 12:18:35 AM N
2 1/1/2018 0:35:23 1/1/2018 12:42:07 AM N
So, I converted df.lpep_pickup_datetime to datetime, but originally it comes in as a string. I'm not sure which one is easier to work with. I want to append 5 fields onto my current dataframe: year, month, day, weekday, and hour.
I tried this:
df['Year']=[d.split('-')[0] for d in df.lpep_pickup_datetime]
df['Month']=[d.split('-')[1] for d in df.lpep_pickup_datetime]
df['Day']=[d.split('-')[2] for d in df.lpep_pickup_datetime]
That gives me this error: AttributeError: 'Timestamp' object has no attribute 'split'
I tried this:
df2 = pd.DataFrame(df.lpep_pickup_datetime.dt.strftime('%m-%d-%Y-%H').str.split('/').tolist(),
columns=['Month', 'Day', 'Year', 'Hour'],dtype=int)
df = pd.concat((df,df2),axis=1)
That gives me this error: AssertionError: 4 columns passed, passed data had 1 columns
Basically, I want to parse df.lpep_pickup_datetime into year, month, day, weekday, and hour, appending each to the same dataframe. How can I do that?
Thanks!!
Here you go, first I'm creating a random dataset and then renaming the column date to the name you want, so you can just copy the code. Pandas has a big section of time-series series manipulation, you don't actually need to import datetime. Here you can find a lot more information about it:
import pandas as pd
date_rng = pd.date_range(start='1/1/2018', end='4/01/2018', freq='H')
df = pd.DataFrame(date_rng, columns=['date'])
df['lpep_pickup_datetime'] = df['date']
df['year'] = df['lpep_pickup_datetime'].dt.year
df['year'] = df['lpep_pickup_datetime'].dt.month
df['weekday'] = df['lpep_pickup_datetime'].dt.weekday
df['day'] = df['lpep_pickup_datetime'].dt.day
df['hour'] = df['lpep_pickup_datetime'].dt.hour
print(df)
Output:
date lpep_pickup_datetime year weekday day hour
0 2018-01-01 00:00:00 2018-01-01 00:00:00 1 0 1 0
1 2018-01-01 01:00:00 2018-01-01 01:00:00 1 0 1 1
2 2018-01-01 02:00:00 2018-01-01 02:00:00 1 0 1 2
3 2018-01-01 03:00:00 2018-01-01 03:00:00 1 0 1 3
4 2018-01-01 04:00:00 2018-01-01 04:00:00 1 0 1 4
... ... ... ... ... ... ...
2156 2018-03-31 20:00:00 2018-03-31 20:00:00 3 5 31 20
2157 2018-03-31 21:00:00 2018-03-31 21:00:00 3 5 31 21
2158 2018-03-31 22:00:00 2018-03-31 22:00:00 3 5 31 22
2159 2018-03-31 23:00:00 2018-03-31 23:00:00 3 5 31 23
2160 2018-04-01 00:00:00 2018-04-01 00:00:00 4 6 1 0
EDIT: Since this is not working (As stated in the comments in this answer), I believe your data is formated incorrectly. Try this before applying anything:
df['lpep_pickup_datetime'] = pd.to_datetime(df['lpep_pickup_datetime'], format='%d/%m/%y %H:%M:%S')
If this format is recognized properly, then you should have no trouble using dt.year,dt.month,dt.hour,dt.day,dt.weekday.
Give this a go. Since your dates are in the datetime dtype already, just use the datetime properties to extract each part.
import pandas as pd
from datetime import datetime as dt
# Creating a fake dataset of dates.
dates = [dt.now().strftime('%d/%m/%Y %H:%M:%S') for i in range(10)]
df = pd.DataFrame({'lpep_pickup_datetime': dates})
df['lpep_pickup_datetime'] = pd.to_datetime(df['lpep_pickup_datetime'])
# Parse each date into its parts and store as a new column.
df['month'] = df['lpep_pickup_datetime'].dt.month
df['day'] = df['lpep_pickup_datetime'].dt.day
df['year'] = df['lpep_pickup_datetime'].dt.year
# ... and so on ...
Output:
lpep_pickup_datetime month day year
0 2019-09-24 16:46:10 9 24 2019
1 2019-09-24 16:46:10 9 24 2019
2 2019-09-24 16:46:10 9 24 2019
3 2019-09-24 16:46:10 9 24 2019
4 2019-09-24 16:46:10 9 24 2019
5 2019-09-24 16:46:10 9 24 2019
6 2019-09-24 16:46:10 9 24 2019
7 2019-09-24 16:46:10 9 24 2019
8 2019-09-24 16:46:10 9 24 2019
9 2019-09-24 16:46:10 9 24 2019

Adding months to a pandas object in Python

I have this dataframe object:
Date
2018-12-14
2019-01-11
2019-01-25
2019-02-08
2019-02-22
2019-07-26
What I want, if it's possible, is to add for example: 3 months to the dates, and then 3 months to the new date (original date + 3 months) and repeat this x times. I am using pd.offsets.MonthOffset but this just adds the months one time and I need to do it more times.
I don't know if it is possible but any help would be perfect.
Thank you so much for taking your time.
The expected output is (for 1 month adding 2 times):
[[2019-01-14, 2019-02-11, 2019-02-25, 2019-03-08, 2019-03-22, 2019-08-26],[2019-02-14, 2019-03-11, 2019-03-25, 2019-04-08, 2019-04-22, 2019-09-26]]
I believe you need loop with f-strings for new columns names:
for i in range(1,4):
df[f'Date_added_{i}_months'] = df['Date'] + pd.offsets.MonthBegin(i)
print (df)
Date Date_added_1_months Date_added_2_months Date_added_3_months
0 2018-12-14 2019-01-01 2019-02-01 2019-03-01
1 2019-01-11 2019-02-01 2019-03-01 2019-04-01
2 2019-01-25 2019-02-01 2019-03-01 2019-04-01
3 2019-02-08 2019-03-01 2019-04-01 2019-05-01
4 2019-02-22 2019-03-01 2019-04-01 2019-05-01
5 2019-07-26 2019-08-01 2019-09-01 2019-10-01
Or:
for i in range(1,4):
df[f'Date_added_{i}_months'] = df['Date'] + pd.offsets.MonthOffset(i)
print (df)
Date Date_added_1_months Date_added_2_months Date_added_3_months
0 2018-12-14 2019-01-14 2019-02-14 2019-03-14
1 2019-01-11 2019-02-11 2019-03-11 2019-04-11
2 2019-01-25 2019-02-25 2019-03-25 2019-04-25
3 2019-02-08 2019-03-08 2019-04-08 2019-05-08
4 2019-02-22 2019-03-22 2019-04-22 2019-05-22
5 2019-07-26 2019-08-26 2019-09-26 2019-10-26
I hope this helps
from dateutil.relativedelta import relativedelta
month_offset = [3,6,9]
for i in month_offset:
df['Date_plus_'+i+'_months'] = df['Date'].map(lambda x: x+relativedelta(months=i))
If your dates are date objects, it should be pretty easy. You can just create a timedelta of 3 months and add it to each date.
Alternatively, you can convert them to date objects with .strptime() and then do what you are suggesting. You can convert them back to a string with .strftime().

Pandas: How to create a datetime object from Week and Year?

I have a dataframe that provides two integer columns with the Year and Week of the year:
import pandas as pd
import numpy as np
L1 = [43,44,51,2,5,12]
L2 = [2016,2016,2016,2017,2017,2017]
df = pd.DataFrame({"Week":L1,"Year":L2})
df
Out[72]:
Week Year
0 43 2016
1 44 2016
2 51 2016
3 2 2017
4 5 2017
5 12 2017
I need to create a datetime-object from these two numbers.
I tried this, but it throws an error:
df["DT"] = df.apply(lambda x: np.datetime64(x.Year,'Y') + np.timedelta64(x.Week,'W'),axis=1)
Then I tried this, it works but gives the wrong result, that is it ignores the week completely:
df["S"] = df.Week.astype(str)+'-'+df.Year.astype(str)
df["DT"] = df["S"].apply(lambda x: pd.to_datetime(x,format='%W-%Y'))
df
Out[74]:
Week Year S DT
0 43 2016 43-2016 2016-01-01
1 44 2016 44-2016 2016-01-01
2 51 2016 51-2016 2016-01-01
3 2 2017 2-2017 2017-01-01
4 5 2017 5-2017 2017-01-01
5 12 2017 12-2017 2017-01-01
I'm really getting lost between Python's datetime, Numpy's datetime64, and pandas Timestamp, can you tell me how it's done correctly?
I'm using Python 3, if that is relevant in any way.
EDIT:
Starting with Python 3.8 the problem is easily solved with a newly introduced method on datetime.date objects: https://docs.python.org/3/library/datetime.html#datetime.date.fromisocalendar
Try this:
In [19]: pd.to_datetime(df.Year.astype(str), format='%Y') + \
pd.to_timedelta(df.Week.mul(7).astype(str) + ' days')
Out[19]:
0 2016-10-28
1 2016-11-04
2 2016-12-23
3 2017-01-15
4 2017-02-05
5 2017-03-26
dtype: datetime64[ns]
Initially I have timestamps in s
It's much easier to parse it from UNIX epoch timestamp:
df['Date'] = pd.to_datetime(df['UNIX_Time'], unit='s')
Timing for 10M rows DF:
Setup:
In [26]: df = pd.DataFrame(pd.date_range('1970-01-01', freq='1T', periods=10**7), columns=['date'])
In [27]: df.shape
Out[27]: (10000000, 1)
In [28]: df['unix_ts'] = df['date'].astype(np.int64)//10**9
In [30]: df
Out[30]:
date unix_ts
0 1970-01-01 00:00:00 0
1 1970-01-01 00:01:00 60
2 1970-01-01 00:02:00 120
3 1970-01-01 00:03:00 180
4 1970-01-01 00:04:00 240
5 1970-01-01 00:05:00 300
6 1970-01-01 00:06:00 360
7 1970-01-01 00:07:00 420
8 1970-01-01 00:08:00 480
9 1970-01-01 00:09:00 540
... ... ...
9999990 1989-01-05 10:30:00 599999400
9999991 1989-01-05 10:31:00 599999460
9999992 1989-01-05 10:32:00 599999520
9999993 1989-01-05 10:33:00 599999580
9999994 1989-01-05 10:34:00 599999640
9999995 1989-01-05 10:35:00 599999700
9999996 1989-01-05 10:36:00 599999760
9999997 1989-01-05 10:37:00 599999820
9999998 1989-01-05 10:38:00 599999880
9999999 1989-01-05 10:39:00 599999940
[10000000 rows x 2 columns]
Check:
In [31]: pd.to_datetime(df.unix_ts, unit='s')
Out[31]:
0 1970-01-01 00:00:00
1 1970-01-01 00:01:00
2 1970-01-01 00:02:00
3 1970-01-01 00:03:00
4 1970-01-01 00:04:00
5 1970-01-01 00:05:00
6 1970-01-01 00:06:00
7 1970-01-01 00:07:00
8 1970-01-01 00:08:00
9 1970-01-01 00:09:00
...
9999990 1989-01-05 10:30:00
9999991 1989-01-05 10:31:00
9999992 1989-01-05 10:32:00
9999993 1989-01-05 10:33:00
9999994 1989-01-05 10:34:00
9999995 1989-01-05 10:35:00
9999996 1989-01-05 10:36:00
9999997 1989-01-05 10:37:00
9999998 1989-01-05 10:38:00
9999999 1989-01-05 10:39:00
Name: unix_ts, Length: 10000000, dtype: datetime64[ns]
Timing:
In [32]: %timeit pd.to_datetime(df.unix_ts, unit='s')
10 loops, best of 3: 156 ms per loop
Conclusion: I think 156 milliseconds for converting 10.000.000 rows is not that slow
Like #Gianmario Spacagna mentioned for datetimes higher like 2018 use %V with %G:
L1 = [43,44,51,2,5,12,52,53,1,2,5,52]
L2 = [2016,2016,2016,2017,2017,2017,2018,2018,2019,2019,2019,2019]
df = pd.DataFrame({"Week":L1,"Year":L2})
df['new'] = pd.to_datetime(df.Week.astype(str)+
df.Year.astype(str).add('-1') ,format='%V%G-%u')
print (df)
Week Year new
0 43 2016 2016-10-24
1 44 2016 2016-10-31
2 51 2016 2016-12-19
3 2 2017 2017-01-09
4 5 2017 2017-01-30
5 12 2017 2017-03-20
6 52 2018 2018-12-24
7 53 2018 2018-12-31
8 1 2019 2018-12-31
9 2 2019 2019-01-07
10 5 2019 2019-01-28
11 52 2019 2019-12-23
There is something fishy going on with weeks starting from 2019. The ISO-8601 standard assigns the 31st December 2018 to the week 1 of year 2019. The other approaches based on:
pd.to_datetime(df.Week.astype(str)+
df.Year.astype(str).add('-2') ,format='%W%Y-%w')
will give shifted results starting from 2019.
In order to be compliant with the ISO-8601 standard you would have to do the following:
import pandas as pd
import datetime
L1 = [52,53,1,2,5,52]
L2 = [2018,2018,2019,2019,2019,2019]
df = pd.DataFrame({"Week":L1,"Year":L2})
df['ISO'] = df['Year'].astype(str) + '-W' + df['Week'].astype(str) + '-1'
df['DT'] = df['ISO'].map(lambda x: datetime.datetime.strptime(x, "%G-W%V-%u"))
print(df)
It prints:
Week Year ISO DT
0 52 2018 2018-W52-1 2018-12-24
1 53 2018 2018-W53-1 2018-12-31
2 1 2019 2019-W1-1 2018-12-31
3 2 2019 2019-W2-1 2019-01-07
4 5 2019 2019-W5-1 2019-01-28
5 52 2019 2019-W52-1 2019-12-23
The week 53 of 2018 is ignored and mapped to the week 1 of 2019.
Please verify yourself on https://www.epochconverter.com/weeks/2019.
If you want to follow ISO Week Date
Weeks start with Monday. Each week's year is the Gregorian year in
which the Thursday falls. The first week of the year, hence, always
contains 4 January. ISO week year numbering therefore slightly
deviates from the Gregorian for some days close to 1 January.
The following sample code, generates a sequence of 60 Dates, starting from 18Dec2016 Sun and adds the appropriate columns.
It adds:
A "Date"
Week Day of the "Date"
Finds the Week Starting Monday of that "Date"
Finds the Year of the Week Starting Monday of that "Date"
Adds a Week Number (ISO)
Gets the Starting Monday Date, from Year and Week Number
Sample Code Below:
# Generate Some Dates
dft1 = pd.DataFrame(pd.date_range('2016-12-18', freq='D', periods=60))
dft1.columns = ['e_FullDate']
dft1['e_FullDateWeekDay'] = dft1.e_FullDate.dt.day_name().str.slice(0,3)
#Add a Week Start Date (Monday)
dft1['e_week_start'] = dft1['e_FullDate'] - pd.to_timedelta(dft1['e_FullDate'].dt.weekday,
unit='D')
dft1['e_week_startWeekDay'] = dft1.e_week_start.dt.day_name().str.slice(0,3)
#Add a Week Start Year
dft1['e_week_start_yr'] = dft1.e_week_start.dt.year
#Add a Week Number of Week Start Monday
dft1['e_week_no'] = dft1['e_week_start'].dt.week
#Add a Week Start generate from Week Number and Year
dft1['e_week_start_from_week_no'] = pd.to_datetime(dft1.e_week_no.astype(str)+
dft1.e_week_start_yr.astype(str).add('-1') ,format='%W%Y-%w')
dft1['e_week_start_from_week_noWeekDay'] = dft1.e_week_start_from_week_no.dt.day_name().str.slice(0,3)
with pd.option_context('display.max_rows', 999, 'display.max_columns', 0, 'display.max_colwidth', 9999):
display(dft1)

Categories

Resources