Combining Year and DayOfYear, H:M:S columns into date time object

Combining Year and DayOfYear, H:M:S columns into date time object - python

I have a time column with the format XXXHHMMSS where XXX is the Day of Year. I also have a year column. I want to merge both these columns into one date time object.
Before I had detached XXX into a new column but this was making it more complicated.
I've converted the two columns to strings
points['UTC_TIME'] = points['UTC_TIME'].astype(str)
points['YEAR_'] = points['YEAR_'].astype(str)
Then I have the following line:
points['Time'] = pd.to_datetime(points['YEAR_'] * 1000 + points['UTC_TIME'], format='%Y%j%H%M%S')
I'm getting the value errorr, ValueError: time data '137084552' does not match format '%Y%j%H%M%S' (match)
Here is a photo of my columns and a link to the data

works fine for me if you combine both columns as string, EX:
import pandas as pd
df = pd.DataFrame({'YEAR_': [2002, 2002, 2002],
'UTC_TIME': [99082552, 135082552, 146221012]})
pd.to_datetime(df['YEAR_'].astype(str) + df['UTC_TIME'].astype(str).str.zfill(9),
format="%Y%j%H%M%S")
# 0 2002-04-09 08:25:52
# 1 2002-05-15 08:25:52
# 2 2002-05-26 22:10:12
# dtype: datetime64[ns]
Note, since %j expects zero-padded day of year, you might need to zero-fill, see first row in the example above.

Related

Printing date from Year, Month and Day columns in Pandas

I am looking to add a new column - "date" to my Pandas dataframe. Below are the first 5 rows of my dataframe:
First 5 rows of the dataframe
As seen from the image, the first column is year, second month, and third day. Below is what I have tried to do:
df['Year'] = pd.to_datetime(df[['Year','Month','Day']])
But, I keep getting the error as below:
ValueError: cannot assemble the datetimes: time data '610101' does not match format
'%Y%m%d' (match)
It would be great if I can get any help for the same.

Following up on my comment, I was able to reproduce the error and solve it by adding 1900 to the year
df = pd.DataFrame({"year": [61,99], "month": [1, 2], "day": [3, 12]})
df["year"] = df["year"] + 1900
df['full_date'] = pd.to_datetime(df[['year','month','day']])
Output:
year month day full_date
0 1961 1 3 1961-01-03
1 1999 2 12 1999-02-12
There is a format parameter to the to_datetime method but for some reason I wasn't able to make it work. doc
df['full_date'] = pd.to_datetime(df[['year','month','day']], format="%y%m%d", infer_datetime_format=False)
this still throw the same error although I am using %y which should be 2 digit year but the error message still says it does not match this format '%Y%m%d'

try this:
df.apply(lambda x:'%s %s %s' % (x['year'],x['month'], x['day']),axis=1)

What about printing out what is actually showing for your selection first ?
print(df[['Year','Month','Day']])
if the data is indeed "610101", then you would likely need to modify it with '19':
pd.to_datetime('19' + df[['Year','Month','Day']])

How can I parse a dataframe which has only date and month columns to new column with pd.datetime format?

A small snippet from my dataframe
I have separate columns for month and date. I need to parse only month and date into a pandas datetime type(other datetime types would also help), so that I could plot a TimeSeries Line plot.
I tried this piece of code,
df['newdate'] = pd.to_datetime(df[['Days','Month']], format='%d%m')
but I threw me an error
KeyError: "['Days' 'Month'] not in index"
How should I approach this error?

an illustration of my comment; if you take the columns as type string, you can join and strptime them easily as follows:
import pandas as pd
df = pd.DataFrame({'Month': [1,2,11,12], 'Days': [1,22,3,23]})
pd.to_datetime(df['Month'].astype(str)+' '+df['Days'].astype(str), format='%m %d')
# 0 1900-01-01
# 1 1900-02-22
# 2 1900-11-03
# 3 1900-12-23
# dtype: datetime64[ns]
You could also add a 'Year' column to your df with an arbitrary year number and use the method you originally intended:
df = pd.DataFrame({'Month': [1,2,11,12], 'Days': [1,22,3,23]})
df['Year'] = 2020
pd.to_datetime(df[['Year', 'Month', 'Days']])

Convert Timestamp to Date only

I've been looking through every thread that I can find, and the only one that is relevant to this type of formatting issue is here, but it's for java...
How parse 2013-03-13T20:59:31+0000 date string to Date
I've got a column with values like 201604 and 201605 that I need to turn into date values like 2016-04-01 and 2016-05-01. To accomplish this, I've done what is below.
#Create Number to build full date
df['DAY_NBR'] = '01'
#Convert Max and Min date to string to do date transformation
df['MAXDT'] = df['MAXDT'].astype(str)
df['MINDT'] = df['MINDT'].astype(str)
#Add the day number to the max date month and year
df['MAXDT'] = df['MAXDT'] + df['DAY_NBR']
#Add the day number to the min date month and year
df['MINDT'] = df['MINDT'] + df['DAY_NBR']
#Convert Max and Min date to integer values
df['MAXDT'] = df['MAXDT'].astype(int)
df['MINDT'] = df['MINDT'].astype(int)
#Convert Max date to datetime
df['MAXDT'] = pd.to_datetime(df['MAXDT'], format='%Y%m%d')
#Convert Min date to datetime
df['MINDT'] = pd.to_datetime(df['MINDT'], format='%Y%m%d')
To be honest, I can work with this output, but it's a little messy because the unique values for the two columns are...
MAXDT Values
['2016-07-01T00:00:00.000000000' '2017-09-01T00:00:00.000000000'
'2018-06-01T00:00:00.000000000' '2017-07-01T00:00:00.000000000'
'2017-03-01T00:00:00.000000000' '2018-12-01T00:00:00.000000000'
'2017-12-01T00:00:00.000000000' '2019-01-01T00:00:00.000000000'
'2018-09-01T00:00:00.000000000' '2018-10-01T00:00:00.000000000'
'2016-04-01T00:00:00.000000000' '2018-03-01T00:00:00.000000000'
'2017-05-01T00:00:00.000000000' '2018-08-01T00:00:00.000000000'
'2017-02-01T00:00:00.000000000' '2016-12-01T00:00:00.000000000'
'2018-01-01T00:00:00.000000000' '2018-02-01T00:00:00.000000000'
'2017-06-01T00:00:00.000000000' '2018-11-01T00:00:00.000000000'
'2018-05-01T00:00:00.000000000' '2019-11-01T00:00:00.000000000'
'2016-06-01T00:00:00.000000000' '2017-10-01T00:00:00.000000000'
'2016-08-01T00:00:00.000000000' '2018-04-01T00:00:00.000000000'
'2016-03-01T00:00:00.000000000' '2016-10-01T00:00:00.000000000'
'2016-11-01T00:00:00.000000000' '2019-12-01T00:00:00.000000000'
'2016-09-01T00:00:00.000000000' '2017-08-01T00:00:00.000000000'
'2016-05-01T00:00:00.000000000' '2017-01-01T00:00:00.000000000'
'2017-11-01T00:00:00.000000000' '2018-07-01T00:00:00.000000000'
'2017-04-01T00:00:00.000000000' '2016-01-01T00:00:00.000000000'
'2016-02-01T00:00:00.000000000' '2019-02-01T00:00:00.000000000'
'2019-07-01T00:00:00.000000000' '2019-10-01T00:00:00.000000000'
'2019-09-01T00:00:00.000000000' '2019-03-01T00:00:00.000000000'
'2019-05-01T00:00:00.000000000' '2019-04-01T00:00:00.000000000'
'2019-08-01T00:00:00.000000000' '2019-06-01T00:00:00.000000000'
'2020-02-01T00:00:00.000000000' '2020-01-01T00:00:00.000000000']
MINDT Values
['2016-04-01T00:00:00.000000000' '2017-07-01T00:00:00.000000000'
'2016-02-01T00:00:00.000000000' '2017-01-01T00:00:00.000000000'
'2017-02-01T00:00:00.000000000' '2018-12-01T00:00:00.000000000'
'2017-08-01T00:00:00.000000000' '2018-04-01T00:00:00.000000000'
'2017-10-01T00:00:00.000000000' '2019-01-01T00:00:00.000000000'
'2018-05-01T00:00:00.000000000' '2018-09-01T00:00:00.000000000'
'2018-10-01T00:00:00.000000000' '2016-01-01T00:00:00.000000000'
'2016-03-01T00:00:00.000000000' '2017-11-01T00:00:00.000000000'
'2017-05-01T00:00:00.000000000' '2018-07-01T00:00:00.000000000'
'2018-06-01T00:00:00.000000000' '2017-12-01T00:00:00.000000000'
'2016-10-01T00:00:00.000000000' '2018-02-01T00:00:00.000000000'
'2017-06-01T00:00:00.000000000' '2018-08-01T00:00:00.000000000'
'2018-03-01T00:00:00.000000000' '2018-11-01T00:00:00.000000000'
'2016-08-01T00:00:00.000000000' '2016-06-01T00:00:00.000000000'
'2018-01-01T00:00:00.000000000' '2016-07-01T00:00:00.000000000'
'2016-11-01T00:00:00.000000000' '2016-09-01T00:00:00.000000000'
'2017-04-01T00:00:00.000000000' '2016-05-01T00:00:00.000000000'
'2017-09-01T00:00:00.000000000' '2016-12-01T00:00:00.000000000'
'2017-03-01T00:00:00.000000000']
I'm trying to build a loop that runs through these dates, and it works, but I don't want to have an index with all of these irrelevant zeros and a T in it. How can I convert these empty timestamp values to just the date that is in yyyy-mm-dd format?
Thank you!

Unfortunately, I believe Pandas always stores datetime objects as datetime64[ns], meaning the precision has to be like that. Even if you attempt to save as datetime64[D], it will be cast to datetime64[ns].
It's possible to just store these datetime objects as strings instead, but the simplest solution is likely to just strip the extra zeroes when you're looping through them (i.e, using df['MAXDT'].to_numpy().astype('datetime64[D]') and looping through the formatted numpy array), or just reformatting using datetime.

KeyError: Timestamp when converting date in column to date

Trying to convert the date (type=datetime) of a complete column into a date to use in a condition later on. The following error keeps showing up:
KeyError: Timestamp('2010-05-04 10:15:55')
Tried multiple things but I'm currently stuck with the code below.
for d in df.column:
pd.to_datetime(df.column[d]).apply(lambda x: x.date())
Also, how do I format the column so I can use it in a statement as follows:
df = df[df.column > 2015-05-28]

Just adding an answer in case anyone else ends up here :
firstly, lets create a dataframe with some dates, change the dtype into a string and convert it back. the errors='ignore' argument will ignore any non date time values in your column, so if you had John Smith in row x it would remain, on the same vein, if you changed errors='coerce' it would change John Smith into NaT (not a time value)
# Create date range with frequency of a day
rng = pd.date_range(start='01/01/18', end ='01/01/19',freq='D')
#pass this into a dataframe
df = pd.DataFrame({'Date' : rng})
print(df.dtypes)
Date datetime64[ns]
#okay lets case this into a str so we can convert it back
df['Date'] = df['Date'].astype(str)
print(df.dtypes)
Date object
# now lets convert it back #
df['Date'] = pd.to_datetime(df.Date,errors='ignore')
print(df.dtypes)
Date datetime64[ns]
# Okay lets slice the data frame for your desired date ##
print(df.loc[df.Date > '2018-12-29'))
Date
363 2018-12-30
364 2018-12-31
365 2019-01-01

The answer as provided by #Datanovice:
pd.to_datetime(df['your column'],errors='ignore')
then inspect the dtype it should be a datetime, if so, just do
df.loc[df.['your column'] > 'your-date' ]

Edit strings in every row of a column of a csv

I have a csv with a date column with dates listed as MM/DD/YY but I want to change the years from 00,02,03 to 1900, 1902, 1903 so that they are instead listed as MM/DD/YYYY
This is what works for me:
df2['Date'] = df2['Date'].str.replace(r'00', '1900')
but I'd have to do this for every year up until 68 (aka repeat this 68 times). I'm not sure how to create a loop to do the code above for every year in that range. I tried this:
ogyear=00
newyear=1900
while ogyear <= 68:
df2['date']=df2['Date'].str.replace(r'ogyear','newyear')
ogyear += 1
newyear += 1
but this returns an empty data set. Is there another way to do this?
I can't use datetime because it assumes that 02 refers to 2002 instead of 1902 and when I try to edit that as a date I get an error message from python saying that dates are immutable and that they must be changed in the original data set. For this reason I need to keep the dates as strings. I also attached the csv here in case thats helpful.

I would do it like this:
# create a data frame
d = pd.DataFrame({'date': ['20/01/00','20/01/20','20/01/50']})
# create year column
d['year'] = d['date'].str.split('/').str[2].astype(int) + 1900
# add new year into old date by replacing old year
d['new_data'] = d['date'].str.replace('[0-9]*.$','') + d['year'].astype(str)
date year new_data
0 20/01/00 1900 20/01/1900
1 20/01/20 1920 20/01/1920
2 20/01/50 1950 20/01/1950

I'd do it the following way:
from datetime import datetime
# create a data frame with dates in format month/day/shortened year
d = pd.DataFrame({'dates': ['2/01/10','5/01/20','6/01/30']})
#loop through the dates in the dates column and add them
#to list in desired form using datetime library,
#then substitute the dataframe dates column with the new ordered list
new_dates = []
for date in list(d['dates']):
dat = datetime.date(datetime.strptime(date, '%m/%d/%y'))
dat = dat.strftime("%m/%d/%Y")
new_dates.append(dat)
new_dates
d['dates'] = pd.Series(new_dates)
d

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Combining Year and DayOfYear, H:M:S columns into date time object - python

Related

Printing date from Year, Month and Day columns in Pandas

How can I parse a dataframe which has only date and month columns to new column with pd.datetime format?

Convert Timestamp to Date only

KeyError: Timestamp when converting date in column to date

Edit strings in every row of a column of a csv

Categories

Resources