Formating integer of year and week to datetime - python

I have a lot of dates in the format of
201801
201802
201803
...
201852
It is in the format YYYYWW.
I need it in any date format, so i can divide these dates into training and testphase.
For example 2018-01-01

You can use strptime() function from datetime package to create the date from the strings:
import datetime
dateString = "201801"
dateObject = datetime.datetime.strptime(dateString + '-1', "%Y%W-%w")
print(dateObject)
Output:
2018-01-01 00:00:00

I hope this is what you're looking for,
import datetime
datetime.datetime.strptime(date, "%Y%W")
date is a value like 201801 as a string.

Related

Dataframe datetime switching month into days

I am trying to convert a day/month/Year Hours:Minutes column into just day and month. When I run my code, the conversion switches the months into days and the days into months.
You can find a copy of my dataframe with the one column I want to switch to Day/Month here
https://file.io/JkWl7fsBN0vl
Below is the code I am using to convert:
df =pd.read_csv('Example.csv')
df['DateTime'] = pd.to_datetime(df['DateTime'])
df.to_csv("output.csv", index=False)
Without knowing the exact DateTime format you are using (the link to the dataframe is broken), I'm going to use an example of
day/month/Year Hours:Minutes
05/09/2014 12:30
You can determine the exact format date code using this site
Essentially, to_datetime() has a format argument where you can pass in the specific format when it is not immediately obvious. This will let you specify that what it keeps confusing for month -> day, day -> month is actually the opposite.
>>> df = pd.DataFrame(['05/09/2014 12:30'],columns=['DateTime'])
DateTime
0 05/09/2014 12:30
>>> df['DateTime'] = pd.to_datetime(df['DateTime'], format='%d/%m/%Y %H:%M')
DateTime
0 2014-09-05 12:30:00
>>> df['day'] = df['DateTime'].dt.day
>>> df['month'] = df['DateTime'].dt.month
DateTime day month
0 2014-09-05 12:30:00 5 9
>>> df['DD/MM'] = df['DateTime'].dt.strftime('%d/%m')
DateTime day month DD/MM
0 2014-09-05 12:30:00 5 9 05/09
I'm unsure about the exact format you want the day and month available in (separate columns, combined), but I provided a few examples, so you can remove the DateTime column when you're done with it and use the one you need.

extract date only from pandas column

I have this column in pandas df:
'''
full_date
2020-12-02T08:11:30-0600
2020-12-02T02:11:50-0600
2020-12-03T08:56:29-0600
'''
I only need the date, hoping to have this column:
'''
date
2020-12-02
2020-12-02
2020-12-03
'''
I have tried to find the solution from previous questions, but still failed. If anyone can help, I will appreciate that a lot. thanks.
In case your column is not a datetime type, you can convert it to that and then use the .dt accessor to get just the date:
>>> df["date"] = df["full_date"].pipe(pd.to_datetime, utc=True).dt.date
>>> print(df)
full_date date
0 2020-12-02T08:11:30-0600 2020-12-02
1 2020-12-02T02:11:50-0600 2020-12-02
2 2020-12-03T08:56:29-0600 2020-12-03
You can convert the datetime very easily using this python code, if suitable.
from dateutil.parser import parse
var = "2020-12-02T08:11:30-0600"
parseddate = parse(var).date()

Converting a string that has day of the year to datetime

I have a string column that looks like below:
2018-24 7:10:0
2018-8 12:1:20
2018-44 13:55:19
The 24,8,44 that you see are the day of the year and not the date.
How can I convert this to datetime column in the below format ?
2018-01-24 07:10:00
2018-01-08 12:01:20
2018-02-13 13:55:19
I am unable to find anything related to converting day of the year ?
You need format string '%Y-%j %H:%M:%S'
In[53]:
import datetime as dt
dt.datetime.strptime('2018-44 13:55:19', '%Y-%j %H:%M:%S')
Out[53]: datetime.datetime(2018, 2, 13, 13, 55, 19)
%j is day of year
For pandas:
In[59]:
import pandas as pd
import io
t="""2018-24 7:10:0
2018-8 12:1:20
2018-44 13:55:19"""
df = pd.read_csv(io.StringIO(t), header=None, names=['datetime'])
df
Out[59]:
datetime
0 2018-24 7:10:0
1 2018-8 12:1:20
2 2018-44 13:55:19
Use pd.to_datetime and pass format param:
In[60]:
df['new_datetime'] = pd.to_datetime(df['datetime'], format='%Y-%j %H:%M:%S')
df
Out[60]:
datetime new_datetime
0 2018-24 7:10:0 2018-01-24 07:10:00
1 2018-8 12:1:20 2018-01-08 12:01:20
2 2018-44 13:55:19 2018-02-13 13:55:19
You can use dateutil.relativedelta for sum the day from the first day of years.
example:
from datetime import datetime
from dateutil.relativedelta import relativedelta
datetime.now()+ relativedelta(days=5)
The documentation at strftime.org identifies the %j format specifier as handling day of the year. I don't know whether it's available on all platforms, but my Mac certainly has it.
Use time.strptime to convert from string to datetime. The output below has a newline inserted for reading convenience:
>>> time.strptime('2018-24 7:10:0', '%Y-%j %H:%M:%S')
time.struct_time(tm_year=2018, tm_mon=1, tm_mday=24, tm_hour=7,
tm_min=10, tm_sec=0, tm_wday=2, tm_yday=24, tm_isdst=-1)
The time.strftime formats datetimes, so you can get what you need by applying it to the output of strptime:
>>> time.strftime('%Y-%m-%d %H:%M:%S',
... time.strptime('2018-24 7:10:0', '%Y-%j %H:%M:%S'))
'2018-01-24 07:10:00'

Pandas: datetime conversion from dtype object

I am working on a timeseries dataset which looks like this:
DateTime SomeVariable
0 01/01 01:00:00 0.24244
1 01/01 02:00:00 0.84141
2 01/01 03:00:00 0.14144
3 01/01 04:00:00 0.74443
4 01/01 05:00:00 0.99999
The date is without year. Initially, the dtype of the DateTime is object and I am trying to change it to pandas datetime format. Since the date in my data is without year, on using:
df['DateTime'] = pd.to_datetime(df.DateTime)
I am getting the error OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 01:00:00
I understand why I am getting the error (as it's not according to the pandas acceptable format), but what I want to know is how I can change the dtype from object to pandas datetime format without having year in my date. I would appreciate the hints.
EDIT 1:
Since, I got to know that I can't do it without having year in the data. So this is how I am trying to change the dtype:
df = pd.read_csv(some file location)
df['DateTime'] = pd.to_datetime('2018/'+df['DateTime'], format='%y%d/%m %H:%M:%S')
df.head()
On doing that, I am getting:
ValueError: time data '2018/ 01/01 01:00:00' doesn't match format specified.
EDIT 2:
Changing the format to '%Y/%m/%d %H:%M:%S'.
My data is hourly data, so it goes till 24h. I have only provided the demo data till 5h.
I was getting the space on adding the year to the DateTime. In order to remove that, this is what I did:
df['DateTime'] = pd.to_datetime('2018/'+df['DateTime'][1:], format='%Y/%m/%d %H:%M:%S')
I am getting the following error for that:
ValueError: time data '2018/ 01/01 02:00:00' doesn't match format specified
On changing the format to '%y/%m/%d %H:%M:%S' with the same code, this is the error I get:
ValueError: time data '2018/ 01/01 02:00:00' does not match format '%y/%m/%d %H:%M:%S' (match)
The problem is because of the gap after the year but I am not able to get rid of it.
EDIT 3:
I am able to get rid of the space after adding the year, however I am still not able to change the dtype.
df['DateTime'] = pd.to_datetime('2018/'+df['DateTime'].str.strip(), format='%Y/%m/%d %H:%M:%S')
ValueError: time data '2018/01/01 01:00:00' doesn't match format specified
I noticed that there are 2 spaces between the date and the time in the error, however adding 2 spaces in the format doesn't help.
EDIT 4 (Solution):
Removed all the multiple whitespaces. Still the format was not matching. The problem was because of the time format. The hours were from 1-24 in my data and pandas support 0-23. Simply changed the time 24:00:00 to 00:00:00 and it works perfectly now.
This is not possible. A datetime object must have a year.
What you can do is ensure all years are aligned for your data.
For example, to convert to datetime while setting year to 2018:
df = pd.DataFrame({'DateTime': ['01/01 01:00:00', '01/01 02:00:00', '01/01 03:00:00',
'01/01 04:00:00', '01/01 05:00:00']})
df['DateTime'] = pd.to_datetime('2018/'+df['DateTime'], format='%Y/%m/%d %H:%M:%S')
print(df)
DateTime
0 2018-01-01 01:00:00
1 2018-01-01 02:00:00
2 2018-01-01 03:00:00
3 2018-01-01 04:00:00
4 2018-01-01 05:00:00
# Remove spaces. Have in mind this will remove all spaces.
df['DateTime'] = df['DateTime'].str.replace(" ", "")
# I'm assuming year does not matter and that 01/01 is in the format day/month.
df['DateTime'] = pd.to_datetime(df['DateTime'], format='%d/%m%H:%M:%S')

Convert date + hour to timestamp - pandas / python

I have a dataset where I have 2 columns in a data frame - Date in YYYY-MM-DD format and another column with Hour in format 0100 (for 1am) until 2300 (for 12pm).
Date Hour
2017-01-01 0200
2017-01-01 0400
etc
In order to get it ready for Time series mode, I want to convert these into datetime objects and concatenate these columns. Example output desired: 2017-01-01 01:00:00, etc
I have tried df['Date'] = pd.to_datetime(df['Date']) and converted this into datetime object, But I'm struggling with the Hour column. Please help
This is one way. The trick is to note that pd.to_datetime is actually quite flexible: it accepts strings of the format "YYYY-MM-DD HHMM".
I assume here that your Hour is given as a string (otherwise leading zeros are not possible).
import pandas as pd
df = pd.DataFrame({'Date': ['2017-01-01', '2017-01-01'],
'Hour': ['0200', '0400']})
# as per #COLDSPEED's suggestion
df['DateTime'] = pd.to_datetime(df['Date'] + ' ' + df['Hour'])
print(df)
# Date Hour DateTime
# 0 2017-01-01 0200 2017-01-01 02:00:00
# 1 2017-01-01 0400 2017-01-01 04:00:00
print(df.dtypes)
# Date object
# Hour object
# DateTime datetime64[ns]
# dtype: object
Previous version with pd.DataFrame.apply is possible but inefficient:
df['DateTime'] = df.apply(lambda x: x['Date'] + ' ' + x['Hour'], axis=1)\
.apply(pd.to_datetime)

Categories

Resources