I have a pandas dataframe, where one column contains a string for the year and quarter in the following format:
2015Q1
My Question:
How do I convert this into two datetime columns, one for the year and one for the quarter.
You can use split, then cast column year to int and if necessary add Q to column q:
df = pd.DataFrame({'date':['2015Q1','2015Q2']})
print (df)
date
0 2015Q1
1 2015Q2
df[['year','q']] = df.date.str.split('Q', expand=True)
df.year = df.year.astype(int)
df.q = 'Q' + df.q
print (df)
date year q
0 2015Q1 2015 Q1
1 2015Q2 2015 Q2
Also you can use Period:
df['date'] = pd.to_datetime(df.date).dt.to_period('Q')
df['year'] = df['date'].dt.year
df['quarter'] = df['date'].dt.quarter
print (df)
date year quarter
0 2015Q1 2015 1
1 2015Q2 2015 2
You could also construct a datetimeIndex and call year and quarter on it.
df.index = pd.to_datetime(df.date)
df['year'] = df.index.year
df['quarter'] = df.index.quarter
date year quarter
date
2015-01-01 2015Q1 2015 1
2015-04-01 2015Q2 2015 2
Note that you don't even need a dedicated column for year and quarter if you have a datetimeIndex, you could do a groupby like this for example: df.groupby(df.index.quarter)
Related
I have the following data (I purposely created a DateTime column from the string column of dates because that's how I am receiving the data):
import numpy as np
import pandas as pd
data = pd.DataFrame({"String_Date" : ['10/12/2021', '9/21/2021', '2/12/2010', '3/25/2009']})
#Create DateTime columns
data['Date'] = pd.to_datetime(data["String_Date"])
data
String_Date Date
0 10/12/2021 2021-10-12
1 9/21/2021 2021-09-21
2 2/12/2010 2010-02-12
3 3/25/2009 2009-03-25
I want to add the following "Month & Year Date" column with entries that are comparable (i.e. can determine whether Oct-12 < Sept-21):
String_Date Date Month & Year Date
0 10/12/2021 2021-10-12 Oct-12
1 9/21/2021 2021-09-21 Sept-21
2 2/12/2010 2010-02-12 Feb-12
3 3/25/2009 2009-03-25 Mar-25
The "Month & Year Date" column doesn't have to be in the exact format I show above (although bonus points if it does), just as long as it shows both the month (abbreviated name, full name, or month number) and the year in the same column. Most importantly, I want to be able to groupby the entries in the "Month & Year Date" column so that I can aggregate data in my original data set across every month.
You can do:
data["Month & Year Date"] = (
data["Date"].dt.month_name() + "-" + data["Date"].dt.year.astype(str)
)
print(data)
Prints:
String_Date Date Month & Year Date
0 10/12/2021 2021-10-12 October-2021
1 9/21/2021 2021-09-21 September-2021
2 2/12/2010 2010-02-12 February-2010
3 3/25/2009 2009-03-25 March-2009
But if you want to group by month/year it's preferable to use:
data.groupby([data["Date"].dt.month, data["Date"].dt.year])
data['Month & Year Date'] = data['Date'].dt.strftime('%b') + '-' + data['Date'].dt.strftime('%y')
print(data)
Outputs:
String_Date Date Month & Year Date
0 10/12/2021 2021-10-12 Oct-21
1 9/21/2021 2021-09-21 Sep-21
2 2/12/2010 2010-02-12 Feb-10
3 3/25/2009 2009-03-25 Mar-09
You can use the .dt accessor to format your date field however you like. For your example, it'd look like this:
data['Month & Year Date'] = data['Date'].dt.strftime('%b-%y')
Although honestly I don't think that's the best representation for the purpose of sorting or evaluating greater than or less than. If what you want is essentially a truncated date, you could do this instead:
As a string -
data['Month & Year Date'] = data['Date'].dt.strftime('%Y-%m-01')
As a datetime object -
data['Month & Year Date'] = data['Date'].dt.to_period.dt.to_timestamp()
You can use strftime. You can find the formats here
data['Month Day'] = data['Date'].apply(lambda x:x.strftime('%b-%d'))
My dataset has dates in the European format, and I'm struggling to convert it into the correct format before I pass it through a pd.to_datetime, so for all day < 12, my month and day switch.
Is there an easy solution to this?
import pandas as pd
import datetime as dt
df = pd.read_csv(loc,dayfirst=True)
df['Date']=pd.to_datetime(df['Date'])
Is there a way to force datetime to acknowledge that the input is formatted at dd/mm/yy?
Thanks for the help!
Edit, a sample from my dates:
renewal["Date"].head()
Out[235]:
0 31/03/2018
2 30/04/2018
3 28/02/2018
4 30/04/2018
5 31/03/2018
Name: Earliest renewal date, dtype: object
After running the following:
renewal['Date']=pd.to_datetime(renewal['Date'],dayfirst=True)
I get:
Out[241]:
0 2018-03-31 #Correct
2 2018-04-01 #<-- this number is wrong and should be 01-04 instad
3 2018-02-28 #Correct
Add format.
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
You can control the date construction directly if you define separate columns for 'year', 'month' and 'day', like this:
import pandas as pd
df = pd.DataFrame(
{'Date': ['01/03/2018', '06/08/2018', '31/03/2018', '30/04/2018']}
)
date_parts = df['Date'].apply(lambda d: pd.Series(int(n) for n in d.split('/')))
date_parts.columns = ['day', 'month', 'year']
df['Date'] = pd.to_datetime(date_parts)
date_parts
# day month year
# 0 1 3 2018
# 1 6 8 2018
# 2 31 3 2018
# 3 30 4 2018
df
# Date
# 0 2018-03-01
# 1 2018-08-06
# 2 2018-03-31
# 3 2018-04-30
I have a date column in my csv file
This is my Date column data
14/3/18
28/3/18
9/4/2018
How to make the year all become 2018 ?
I have tried this
df['DateTime'] = pd.to_datetime(df['Date'])
print (df['DateTime'])
but it return
1 2018-03-14
2 2018-03-28
3 2018-09-04
The Last column 09 become month but it supposed 04 is month.
Add parameter dayfirst=True:
df['DateTime'] = pd.to_datetime(df['Date'], dayfirst=True)
print (df)
Date DateTime
0 14/3/18 2018-03-14
1 28/3/18 2018-03-28
2 9/4/2018 2018-04-09
You can use .dt.strftime:
df['DateTime'] = pd.to_datetime(df['DateTime']).dt.strftime("%Y-%d-%m")
Output:
0 2018-14-03
1 2018-28-03
2 2018-04-09
Name: A, dtype: object
I caught up with this scenario and don't know how can I solve this.
I have the data frame where I am trying to add "week_of_year" and "year" column based in the "date" column of the pandas' data frame which is working fine.
import pandas as pd
df = pd.DataFrame({'date': ['2018-12-31', '2019-01-01', '2019-12-31', '2020-01-01']})
df['date'] = pd.to_datetime(df['date'])
df['week_of_year'] = df['date'].apply(lambda x: x.weekofyear)
df['year'] = df['date'].apply(lambda x: x.year)
print(df)
Current Output
date week_of_year year
0 2018-12-31 1 2018
1 2019-01-01 1 2019
2 2019-12-31 1 2019
3 2020-01-01 1 2020
Expected Output
So here what I am expecting is for 2018 and 2019 the last date was the first week of the new year which is 2019 and 2020 respectively so I want to add logic in the year, where the week is 1 but the date belongs for the previous year so the year column would track that as in the expected output.
date week_of_year year
0 2018-12-31 1 2019
1 2019-01-01 1 2019
2 2019-12-31 1 2020
3 2020-01-01 1 2020
Try:
df['date'] = pd.to_datetime(df['date'])
df['week_of_year'] = df['date'].dt.weekofyear
df['year']=(df['date']+pd.to_timedelta(6-df['date'].dt.weekday, unit='d')).dt.year
Outputs:
date week_of_year year
0 2018-12-31 1 2019
1 2019-01-01 1 2019
2 2019-12-31 1 2020
3 2020-01-01 1 2020
Few things - generally avoid .apply(..).
For datetime columns you can just interact with the date through df[col].dt variable.
Then to get the last day of the week just add to date 6-weekday where weekday is between 0 (Monday) and 6 to the date
TLDR CODE
To get the week number as a series
df['DATE'].dt.isocalendar().week
To set a new column to the week use same function and set series returned to a column:
df['WEEK'] = df['DATE'].dt.isocalendar().week
TLDR EXPLANATION
Use the pd.series.dt.isocalendar().week to get the the week for a given series object.
Note:
column "DATE" must be stored as a datetime column
I have a data frame with 3 columns: time (which is in the format 'YYYY-MM-DDTHH:MM:SSZ'), device_id, and rain, but I need the first column, time, to become three columns of day, month, and year with values from the timestamp.
So the original data frame looks something like this:
time device_id rain
2016-12-27T00:00:00Z 9b839362-b06d-4217-96f5-f261c1ada8d6 NaN
2016-12-28T00:00:00Z 9b839362-b06d-4217-96f5-f261c1ada8d6 0.2
2016-12-29T00:00:00Z 9b839362-b06d-4217-96f5-f261c1ada8d6 NaN
2016-12-30T00:00:00Z 9b839362-b06d-4217-96f5-f261c1ada8d6 NaN
2016-12-31T00:00:00Z 9b839362-b06d-4217-96f5-f261c1ada8d6 NaN
But I'm trying to get the data frame to look like this:
day month year device_id rain
27 12 2016 9b839362-b06d-4217-96f5-f261c1ada8d6 NaN
28 12 2016 9b839362-b06d-4217-96f5-f261c1ada8d6 0.2
29 12 2016 9b839362-b06d-4217-96f5-f261c1ada8d6 NaN
30 12 2016 9b839362-b06d-4217-96f5-f261c1ada8d6 NaN
31 12 2016 9b839362-b06d-4217-96f5-f261c1ada8d6 NaN
I don't care about the hour/seconds/minutes but need these values from the original time stamp, and I don't even know where to start. Please help!
Here's some reproducible code to get started:
>> import pandas as pd
>> df = pd.DataFrame([['2016-12-27T00:00:00Z', '9b839362-b06d-4217-96f5-f261c1ada8d6', 'NaN']], columns=['time', 'device_id', 'rain'])
>> print df
2016-12-27T00:00:00Z 9b849362-b06d-4217-96f5-f261c1ada8d6 NaN
The cleanest way is to use builtin pandas datetime functions.
First, convert the column to datetime:
df["time"] = pd.to_datetime(df["time"])
Then, extract your information:
df["day"] = df['time'].map(lambda x: x.day)
df["month"] = df['time'].map(lambda x: x.month)
df["year"] = df['time'].map(lambda x: x.year)
Just split the time with - or T and the first three elements should correspond to the year, month and day column, concatenate it with the other two columns will get what you need:
pd.concat([df.drop('time', axis = 1),
(df.time.str.split("-|T").str[:3].apply(pd.Series)
.rename(columns={0:'year', 1:'month', 2:'day'}))], axis = 1)
An alternative close to #nlassaux's approach would be:
df['time'] = pd.to_datetime(df['time'])
df['year'] = df.time.dt.year
df['month'] = df.time.dt.month
df['day'] = df.time.dt.day
df.drop('time', axis=1, inplace=True)