I am currently working on a dataset of 8 000 rows.
I want to split my date column by day, month, year. dtype for the date is object
How to convert the whole column of date by date. month, year?
A sample of the date of my dataset is shown below:
date
01-01-2016
01-01-2016
01-01-2016
01-01-2016
01-01-2016
df=pd.DataFrame(columns=['date'])
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
print(df)
dt=datetime.strptime('date',"%d-%m-%y")
print(dt)
This is the code I am using for date splitting but it is showing mean error
ValueError: time data 'date' does not match format '%d-%m-%y'
If you have pandas you can do this:
import pandas as pd
# Recreate your dataframe
df = pd.DataFrame(dict(date=['01-01-2016']*6))
df.date = pd.to_datetime(df.date)
# Create 3 new columns
df[['year','month','day']] = df.date.apply(lambda x: pd.Series(x.strftime("%Y,%m,%d").split(",")))
df
Returns
date year month day
0 2016-01-01 2016 01 01
1 2016-01-01 2016 01 01
2 2016-01-01 2016 01 01
3 2016-01-01 2016 01 01
4 2016-01-01 2016 01 01
5 2016-01-01 2016 01 01
Or without the formatting options:
df['year'],df['month'],df['day'] = df.date.dt.year, df.date.dt.month, df.date.dt.day
df
Returns
date year month day
0 2016-01-01 2016 1 1
1 2016-01-01 2016 1 1
2 2016-01-01 2016 1 1
3 2016-01-01 2016 1 1
4 2016-01-01 2016 1 1
5 2016-01-01 2016 1 1
Related
I have a dataframe df with Date column:
Date
--------
Wed 23 Dec
Sat 28 Nov
Thu 26 Nov
Sun 22 Nov
Tue 1 Dec
Wed 2 Dec
The Date column is object-type, I want to change the format using format="%m-%d-%Y" into yyyy-dd-mm
Expected output df:
Date
---------
2020-23-12
2020-28-11
2020-26-11
2020-22-11
2020-01-12
2020-02-12
Thanks in advance for the help!
Use to_datetime with format specified original data with added year, get column filled by datetimes:
df['Date'] = pd.to_datetime(df['Date']+'2020', format="%a %d %b%Y")
print (df)
Date
0 2020-12-23
1 2020-11-28
2 2020-11-26
3 2020-11-22
4 2020-12-01
5 2020-12-02
If need custom format add Series.dt.strftime, but datetimes are lost, get strings:
df['Date'] = pd.to_datetime(df['Date']+'2020', format="%a %d %b%Y").dt.strftime("%Y-%d-%m")
print (df)
Date
0 2020-23-12
1 2020-28-11
2 2020-26-11
3 2020-22-11
4 2020-01-12
5 2020-02-12
I need to extract date features (Day, Week, Month, Year) from a date column of a pandas data frame, using pandasql. I can't seem to locate what version of SQL pandasql is using so I am not sure how to accomplish this feat. Has anyone else tried something similar?
Here is what I have so far:
#import the needed libraries
import numpy as np
import pandas as pd
import pandasql as psql
#establish dataset
doc = 'room_data.csv'
df = pd.read_csv(doc)
df.head()
df2 = psql.sqldf('''
SELECT
Timestamp
, EXTRACT (DAY FROM "Timestamp") AS Day --DOES NOT WORK IN THIS VERSION OF SQL
, Temperature
, Humidity
FROM df
''')
df2.head()
Data Frame Example:
As far as I know , SQLite does not support EXTRACT() function.
You can try strftime('%d', Timestamp)
psql.sqldf('''SELECT
Timestamp
, strftime('%d', Timestamp) AS Day
, Temperature
, Humidity
FROM df
''')
Consider the below example which demonstrates the above query:
Example dataframe:
np.random.seed(123)
dates = pd.date_range('01-01-2020','01-05-2020',freq='H')
temp = np.random.randint(0,100,97)
humidity = np.random.randint(20,100,97)
df = pd.DataFrame({"Timestamp":dates,"Temperature":temp,"Humidity":humidity})
print(df.head())
Timestamp Temperature Humidity
0 2020-01-01 00:00:00 66 29
1 2020-01-01 01:00:00 92 43
2 2020-01-01 02:00:00 98 34
3 2020-01-01 03:00:00 17 58
4 2020-01-01 04:00:00 83 39
Working Query:
import pandasql as ps
query = '''SELECT
Timestamp
, strftime('%d', Timestamp) AS Day
, Temperature
, Humidity
FROM df'''
print(ps.sqldf(query).head())
Timestamp Day Temperature Humidity
0 2020-01-01 00:00:00.000000 01 66 29
1 2020-01-01 01:00:00.000000 01 92 43
2 2020-01-01 02:00:00.000000 01 98 34
3 2020-01-01 03:00:00.000000 01 17 58
4 2020-01-01 04:00:00.000000 01 83 39
you can get more details here to get more date extract functions, common ones are shown below:
import pandasql as ps
query = '''SELECT
Timestamp
, strftime('%d', Timestamp) AS Day
,strftime('%m', Timestamp) AS Month
,strftime('%Y', Timestamp) AS Year
,strftime('%H', Timestamp) AS Hour
, Temperature
, Humidity
FROM df'''
print(ps.sqldf(query).head())
Timestamp Day Month Year Hour Temperature Humidity
0 2020-01-01 00:00:00.000000 01 01 2020 00 66 29
1 2020-01-01 01:00:00.000000 01 01 2020 01 92 34
2 2020-01-01 02:00:00.000000 01 01 2020 02 98 90
3 2020-01-01 03:00:00.000000 01 01 2020 03 17 32
4 2020-01-01 04:00:00.000000 01 01 2020 04 83 74
Here you go:
df['year'] = pd.DatetimeIndex(df['date']).year
df['month'] = pd.DatetimeIndex(df['date']).month
df['day'] = pd.DatetimeIndex(df['date']).day
df.head(10)
Close year month day
Date
2014-12-31 31.4816 2014 12 31
2015-01-04 31.6416 2015 1 4
2015-01-05 31.8336 2015 1 5
2015-01-06 31.1168 2015 1 6
2015-01-08 31.7440 2015 1 8
2015-01-11 31.6736 2015 1 11
2015-01-12 32.4032 2015 1 12
2015-01-13 32.7744 2015 1 13
2015-01-14 33.9008 2015 1 14
2015-01-15 33.5936 2015 1 15
freq=None
So I want to add frequency to my data to make seasonal decomposition
result = seasonal_decompose(df['Return'], model='add')
However using asfreq('D') for business day MON-FRI week and my Data is SUN-THU Week
So I want to know what frequency I can use or how I can adjust 'D' frequency to SUN-THU Week
Using this https://www.geeksforgeeks.org/python-program-to-find-day-of-the-week-for-a-given-date/ you can find the day for each column
# Python program to find the day of
# the week for a given date
import datetime
import calendar
def findDay(date):
born = datetime.datetime.strptime(date, '%d %m %Y').weekday()
return (calendar.day_name[born])
# Driver program
df['day'] = findDay(df['Date'])
after that you could apply pandas value_counts() and find the frequency for every day.
df.day.value_counts()
Hope it answers your question.
I need a column for the df that will be used to group it by weeks.
The problem is all the reports in Tableau are build using the following format for week: 2019-01-01 it is like, using the first day of week repetitively Mon-Sun.
Data:
cw = pd.DataFrame({ "lead_date" : [2019-01-01 00:02:16, 2018-08-01 00:02:16 , 2017-07-07 00:02:16, 2015-12-01 00:02:16, 2016-09-01 00:02:16] ,
"name": ["aa","bb","cc", "dd", "EE"] )}
My code:
# extracting
cw["week"] = cw["lead_date"].apply(lambda df: df.strftime("%W") )
cw["month"] = cw["lead_date"].apply(lambda df: df.strftime("%m") )
cw["year"] = cw["lead_date"].apply(lambda df: df.strftime("%Y") )
Output:
lead_date year month week
2019-01-01 00:02:16, 2019 , 01 , 00
-
-
-
etc..
Desired output:
having week as date format rather then just 00 or 01 etc..
lead_date year month week
2019-01-01 00:02:16, 2019 , 01 , 2019-01-01
2019-01-15 00:02:16, 2019 , 01 , 2019-01-14
2019-01-25 00:02:16, 2019 , 01 , 2019-01-21
2019-01-28 00:02:16, 2019 , 01 , 2019-01-21
You can do like this:
from datetime import datetime, timedelta
cw['lead_date'].apply(lambda r: datetime.strptime(r, '%Y-%m-%d') - timedelta(days=datetime.strptime(r, '%Y-%m-%d').weekday()))
This will set every date to starting day of that week.
You can do it as follows with using pandas.DatetimeIndex.dayofweek and pandas.Timedelta()
(Note that the first day of 2019.01.01. week is 2018.12.31.):
import pandas as pd
cw = pd.DataFrame({"lead_date" : pd.DatetimeIndex([
"2019-01-01 00:02:16", "2018-08-01 00:02:16" , "2017-07-07 00:02:16",
"2015-12-01 00:02:16", "2016-09-01 00:02:16"]),
"name": ["aa","bb","cc", "dd", "EE"]})
# extracting
cw["month"] = cw["lead_date"].apply(lambda df: df.strftime("%m") )
cw["year"] = cw["lead_date"].apply(lambda df: df.strftime("%Y") )
cw["week"] = (cw["lead_date"] - ((cw["lead_date"].dt.dayofweek) *
pd.Timedelta(days=1)).values.astype('M8[D]'))
print(cw[["lead_date", "year", "month", "week"]])
Out:
lead_date year month week
0 2019-01-01 00:02:16 2019 01 2018-12-31
1 2018-08-01 00:02:16 2018 08 2018-07-30
2 2017-07-07 00:02:16 2017 07 2017-07-03
3 2015-12-01 00:02:16 2015 12 2015-11-30
4 2016-09-01 00:02:16 2016 09 2016-08-29
I think this gets you the output you want:
cw = pd.DataFrame({ "lead_date" : [pd.to_datetime('2019-01-01 00:02:16'), pd.to_datetime('2018-08-01 00:02:16') , pd.to_datetime('2017-07-07 00:02:16'), pd.to_datetime('2015-12-01 00:02:16'), pd.to_datetime('2016-09-01 00:02:16')] ,
"name": ["aa","bb","cc", "dd", "EE"] })
cw["year"] = cw["lead_date"].apply(lambda df: df.strftime("%Y") )
cw["month"] = cw["lead_date"].apply(lambda df: df.strftime("%m") )
cw["week"] = cw["lead_date"].apply(lambda df: df.strftime("%Y-%m-%d") )
cw.drop(columns='name', inplace=True)
output:
lead_date year month week
0 2019-01-01 00:02:16 2019 01 2019-01-01
1 2018-08-01 00:02:16 2018 08 2018-08-01
2 2017-07-07 00:02:16 2017 07 2017-07-07
3 2015-12-01 00:02:16 2015 12 2015-12-01
4 2016-09-01 00:02:16 2016 09 2016-09-01
I found this but cant get the syntax correct.
time.asctime(time.strptime('2017 28 1', '%Y %W %w'))
I want to set a new column to show month in the format "201707" for July. It can be int64 or string doesnt have to be an actual readable date in the column.
My dataframe column ['Week'] is also in the format 201729 i.e. YYYYWW
dfAttrition_Billings_KPIs['Day_1'] = \
time.asctime(time.strptime(dfAttrition_Billings_KPIs['Week'].str[:4]
+ dfAttrition_Billings_KPIs['Month'].str[:-2] - 1 + 1', '%Y %W %w'))
So I want the output of the rows that have week 201729 to show in a new field month 201707. the output depends on what the row value is in 'Week'.
I have a million records so would like to avoid iterations of rows, lambdas and slow functions where possible :)
Use to_datetime with parameter format with add 1 for Mondays, last for format YYYYMM use strftime
df = pd.DataFrame({'date':[201729,201730,201735]})
df['date1']=pd.to_datetime(df['date'].astype(str) + '1', format='%Y%W%w')
df['date2']=pd.to_datetime(df['date'].astype(str) + '1', format='%Y%W%w').dt.strftime('%Y%m')
print (df)
date date1 date2
0 201729 2017-07-17 201707
1 201730 2017-07-24 201707
2 201735 2017-08-28 201708
If need convert from datetime to weeks custom format:
df = pd.DataFrame({'date':pd.date_range('2017-01-01', periods=10)})
df['date3'] = df['date'].dt.strftime('%Y %W %w')
print (df)
date date3
0 2017-01-01 2017 00 0
1 2017-01-02 2017 01 1
2 2017-01-03 2017 01 2
3 2017-01-04 2017 01 3
4 2017-01-05 2017 01 4
5 2017-01-06 2017 01 5
6 2017-01-07 2017 01 6
7 2017-01-08 2017 01 0
8 2017-01-09 2017 02 1
9 2017-01-10 2017 02 2