Python: How do I extract the year from a date - python

I have a pandas dataframe and in one of the columns it has the date, for example, 1/7/13. I want to extract the year out of this. How would I do it?
I've tried
years_2 = df3.pivot_table(index=['ACCIDENT_DATE'], aggfunc ='size')
print(years_2)
but I get the recurrence of the date but I want to count just the number of times that an year occurs. Something like this:
Year
2013 1000
2014 59882
2015 23232

datetime.strptime will convert a string to datetime object based on the format you want. Then you can get year attribute from this object like below:
from datetime import datetime
datetime.strptime('1/7/13', '%d/%m/%y').year

If df3.ACCIDENT_DATE is of dtype datetime, then you can get the components of the date with .dt accessors
d = df3.ACCIDENT_DATE
# return series of dtype int
year = d.dt.year
month = d.dt.month
day = d.dt.day
If it has a time component
# return series of dtype datetime
date_ = d.dt.date
time_ = d.dt.time
# return series of dtype int
h = d.dt.hour
m = d.dt.minute
And many more in the docs

you can use value_counts function to get number of times year occurred.
years_2["ACCIDENT_DATE"] = pd.to_datetime(years_2["ACCIDENT_DATE"])
counts = years_2["ACCIDENT_DATE"].dt.year.value_counts()
to get year as a separate column
years_2["YEAR"] = years_2["ACCIDENT_DATE"].dt.year

Related

Combining Year and DayOfYear, H:M:S columns into date time object

I have a time column with the format XXXHHMMSS where XXX is the Day of Year. I also have a year column. I want to merge both these columns into one date time object.
Before I had detached XXX into a new column but this was making it more complicated.
I've converted the two columns to strings
points['UTC_TIME'] = points['UTC_TIME'].astype(str)
points['YEAR_'] = points['YEAR_'].astype(str)
Then I have the following line:
points['Time'] = pd.to_datetime(points['YEAR_'] * 1000 + points['UTC_TIME'], format='%Y%j%H%M%S')
I'm getting the value errorr, ValueError: time data '137084552' does not match format '%Y%j%H%M%S' (match)
Here is a photo of my columns and a link to the data
works fine for me if you combine both columns as string, EX:
import pandas as pd
df = pd.DataFrame({'YEAR_': [2002, 2002, 2002],
'UTC_TIME': [99082552, 135082552, 146221012]})
pd.to_datetime(df['YEAR_'].astype(str) + df['UTC_TIME'].astype(str).str.zfill(9),
format="%Y%j%H%M%S")
# 0 2002-04-09 08:25:52
# 1 2002-05-15 08:25:52
# 2 2002-05-26 22:10:12
# dtype: datetime64[ns]
Note, since %j expects zero-padded day of year, you might need to zero-fill, see first row in the example above.

How to add month column to a date column in python?

date['Maturity_date'] = data.apply(lambda data: relativedelta(months=int(data['TRM_LNTH_MO'])) + data['POL_EFF_DT'], axis=1)
Tried this also:
date['Maturity_date'] = date['POL_EFF_DT'] + date['TRM_LNTH_MO'].values.astype("timedelta64[M]")
TypeError: 'type' object does not support item assignment
import pandas as pd
import datetime
#Convert the date column to date format
date['date_format'] = pd.to_datetime(date['Maturity_date'])
#Add a month column
date['Month'] = date['date_format'].apply(lambda x: x.strftime('%b'))
If you are using Pandas, you may use a resource called: "Frequency Aliases". Something very out of the box:
# For "periods": 1 (is the current date you have) and 2 the result, plus 1, by the frequency of 'M' (month).
import pandas as pd
_new_period = pd.date_range(_existing_date, periods=2, freq='M')
Now you can get exactly the period you want as the second element returned:
# The index for your information is 1. Index 0 is the existing date.
_new_period.strftime('%Y-%m-%d')[1]
# You can format in different ways. Only Year, Month or Day. Whatever.
Consult this link for further information

Convert Timestamp to Date only

I've been looking through every thread that I can find, and the only one that is relevant to this type of formatting issue is here, but it's for java...
How parse 2013-03-13T20:59:31+0000 date string to Date
I've got a column with values like 201604 and 201605 that I need to turn into date values like 2016-04-01 and 2016-05-01. To accomplish this, I've done what is below.
#Create Number to build full date
df['DAY_NBR'] = '01'
#Convert Max and Min date to string to do date transformation
df['MAXDT'] = df['MAXDT'].astype(str)
df['MINDT'] = df['MINDT'].astype(str)
#Add the day number to the max date month and year
df['MAXDT'] = df['MAXDT'] + df['DAY_NBR']
#Add the day number to the min date month and year
df['MINDT'] = df['MINDT'] + df['DAY_NBR']
#Convert Max and Min date to integer values
df['MAXDT'] = df['MAXDT'].astype(int)
df['MINDT'] = df['MINDT'].astype(int)
#Convert Max date to datetime
df['MAXDT'] = pd.to_datetime(df['MAXDT'], format='%Y%m%d')
#Convert Min date to datetime
df['MINDT'] = pd.to_datetime(df['MINDT'], format='%Y%m%d')
To be honest, I can work with this output, but it's a little messy because the unique values for the two columns are...
MAXDT Values
['2016-07-01T00:00:00.000000000' '2017-09-01T00:00:00.000000000'
'2018-06-01T00:00:00.000000000' '2017-07-01T00:00:00.000000000'
'2017-03-01T00:00:00.000000000' '2018-12-01T00:00:00.000000000'
'2017-12-01T00:00:00.000000000' '2019-01-01T00:00:00.000000000'
'2018-09-01T00:00:00.000000000' '2018-10-01T00:00:00.000000000'
'2016-04-01T00:00:00.000000000' '2018-03-01T00:00:00.000000000'
'2017-05-01T00:00:00.000000000' '2018-08-01T00:00:00.000000000'
'2017-02-01T00:00:00.000000000' '2016-12-01T00:00:00.000000000'
'2018-01-01T00:00:00.000000000' '2018-02-01T00:00:00.000000000'
'2017-06-01T00:00:00.000000000' '2018-11-01T00:00:00.000000000'
'2018-05-01T00:00:00.000000000' '2019-11-01T00:00:00.000000000'
'2016-06-01T00:00:00.000000000' '2017-10-01T00:00:00.000000000'
'2016-08-01T00:00:00.000000000' '2018-04-01T00:00:00.000000000'
'2016-03-01T00:00:00.000000000' '2016-10-01T00:00:00.000000000'
'2016-11-01T00:00:00.000000000' '2019-12-01T00:00:00.000000000'
'2016-09-01T00:00:00.000000000' '2017-08-01T00:00:00.000000000'
'2016-05-01T00:00:00.000000000' '2017-01-01T00:00:00.000000000'
'2017-11-01T00:00:00.000000000' '2018-07-01T00:00:00.000000000'
'2017-04-01T00:00:00.000000000' '2016-01-01T00:00:00.000000000'
'2016-02-01T00:00:00.000000000' '2019-02-01T00:00:00.000000000'
'2019-07-01T00:00:00.000000000' '2019-10-01T00:00:00.000000000'
'2019-09-01T00:00:00.000000000' '2019-03-01T00:00:00.000000000'
'2019-05-01T00:00:00.000000000' '2019-04-01T00:00:00.000000000'
'2019-08-01T00:00:00.000000000' '2019-06-01T00:00:00.000000000'
'2020-02-01T00:00:00.000000000' '2020-01-01T00:00:00.000000000']
MINDT Values
['2016-04-01T00:00:00.000000000' '2017-07-01T00:00:00.000000000'
'2016-02-01T00:00:00.000000000' '2017-01-01T00:00:00.000000000'
'2017-02-01T00:00:00.000000000' '2018-12-01T00:00:00.000000000'
'2017-08-01T00:00:00.000000000' '2018-04-01T00:00:00.000000000'
'2017-10-01T00:00:00.000000000' '2019-01-01T00:00:00.000000000'
'2018-05-01T00:00:00.000000000' '2018-09-01T00:00:00.000000000'
'2018-10-01T00:00:00.000000000' '2016-01-01T00:00:00.000000000'
'2016-03-01T00:00:00.000000000' '2017-11-01T00:00:00.000000000'
'2017-05-01T00:00:00.000000000' '2018-07-01T00:00:00.000000000'
'2018-06-01T00:00:00.000000000' '2017-12-01T00:00:00.000000000'
'2016-10-01T00:00:00.000000000' '2018-02-01T00:00:00.000000000'
'2017-06-01T00:00:00.000000000' '2018-08-01T00:00:00.000000000'
'2018-03-01T00:00:00.000000000' '2018-11-01T00:00:00.000000000'
'2016-08-01T00:00:00.000000000' '2016-06-01T00:00:00.000000000'
'2018-01-01T00:00:00.000000000' '2016-07-01T00:00:00.000000000'
'2016-11-01T00:00:00.000000000' '2016-09-01T00:00:00.000000000'
'2017-04-01T00:00:00.000000000' '2016-05-01T00:00:00.000000000'
'2017-09-01T00:00:00.000000000' '2016-12-01T00:00:00.000000000'
'2017-03-01T00:00:00.000000000']
I'm trying to build a loop that runs through these dates, and it works, but I don't want to have an index with all of these irrelevant zeros and a T in it. How can I convert these empty timestamp values to just the date that is in yyyy-mm-dd format?
Thank you!
Unfortunately, I believe Pandas always stores datetime objects as datetime64[ns], meaning the precision has to be like that. Even if you attempt to save as datetime64[D], it will be cast to datetime64[ns].
It's possible to just store these datetime objects as strings instead, but the simplest solution is likely to just strip the extra zeroes when you're looping through them (i.e, using df['MAXDT'].to_numpy().astype('datetime64[D]') and looping through the formatted numpy array), or just reformatting using datetime.

Edit strings in every row of a column of a csv

I have a csv with a date column with dates listed as MM/DD/YY but I want to change the years from 00,02,03 to 1900, 1902, 1903 so that they are instead listed as MM/DD/YYYY
This is what works for me:
df2['Date'] = df2['Date'].str.replace(r'00', '1900')
but I'd have to do this for every year up until 68 (aka repeat this 68 times). I'm not sure how to create a loop to do the code above for every year in that range. I tried this:
ogyear=00
newyear=1900
while ogyear <= 68:
df2['date']=df2['Date'].str.replace(r'ogyear','newyear')
ogyear += 1
newyear += 1
but this returns an empty data set. Is there another way to do this?
I can't use datetime because it assumes that 02 refers to 2002 instead of 1902 and when I try to edit that as a date I get an error message from python saying that dates are immutable and that they must be changed in the original data set. For this reason I need to keep the dates as strings. I also attached the csv here in case thats helpful.
I would do it like this:
# create a data frame
d = pd.DataFrame({'date': ['20/01/00','20/01/20','20/01/50']})
# create year column
d['year'] = d['date'].str.split('/').str[2].astype(int) + 1900
# add new year into old date by replacing old year
d['new_data'] = d['date'].str.replace('[0-9]*.$','') + d['year'].astype(str)
date year new_data
0 20/01/00 1900 20/01/1900
1 20/01/20 1920 20/01/1920
2 20/01/50 1950 20/01/1950
I'd do it the following way:
from datetime import datetime
# create a data frame with dates in format month/day/shortened year
d = pd.DataFrame({'dates': ['2/01/10','5/01/20','6/01/30']})
#loop through the dates in the dates column and add them
#to list in desired form using datetime library,
#then substitute the dataframe dates column with the new ordered list
new_dates = []
for date in list(d['dates']):
dat = datetime.date(datetime.strptime(date, '%m/%d/%y'))
dat = dat.strftime("%m/%d/%Y")
new_dates.append(dat)
new_dates
d['dates'] = pd.Series(new_dates)
d

Python Date Index: finding the closest date a year ago from today

I have a panda dataframe (stock prices) with an index in a date format. It is daily but only for working days.
I basically try to compute some price performance YTD and from a year ago.
To get the first date of the actual year in my dataframe I used the following method:
today = str(datetime.date.today())
curr_year = int(today[:4])
curr_month = int(today[5:7])
first_date_year = (df[str(curr_year)].first_valid_index())
Now I try to get the closest date a year ago (exactly one year from the last_valid_index()). I could extract the month and the year but then it wouldn't be as precise. Any suggestion ?
Thanks
Since you didn't provide any data, I am assuming that you have a list of dates (string types) like the following:
dates = ['11/01/2016', '12/01/2016', '02/01/2017', '03/01/2017']
You then need to transform that into datetime format, I would suggest using pandas:
pd_dates = pd.to_datetime(dates)
Then you have to define today and one year ago. I would suggest using datetime for that:
today = datetime.today()
date_1yr_ago = datetime(today.year-1, today.month, today.day)
Lastly, you slice the date list for dates larger than the date_1yr_ago value and get the first value of that slice:
pd_dates[pd_dates > date_1yr_ago][0]
This will return the first date that is larger than the 1 year ago date.
output:
Timestamp('2017-02-01 00:00:00')
You can convert that datetime value to string with the following code:
datetime.strftime(pd_dates[pd_dates > date_1yr_ago][0], '%Y/%m/%d')
output:
'2017/02/01'

Categories

Resources