Converting a string that has day of the year to datetime - python

I have a string column that looks like below:
2018-24 7:10:0
2018-8 12:1:20
2018-44 13:55:19
The 24,8,44 that you see are the day of the year and not the date.
How can I convert this to datetime column in the below format ?
2018-01-24 07:10:00
2018-01-08 12:01:20
2018-02-13 13:55:19
I am unable to find anything related to converting day of the year ?

You need format string '%Y-%j %H:%M:%S'
In[53]:
import datetime as dt
dt.datetime.strptime('2018-44 13:55:19', '%Y-%j %H:%M:%S')
Out[53]: datetime.datetime(2018, 2, 13, 13, 55, 19)
%j is day of year
For pandas:
In[59]:
import pandas as pd
import io
t="""2018-24 7:10:0
2018-8 12:1:20
2018-44 13:55:19"""
df = pd.read_csv(io.StringIO(t), header=None, names=['datetime'])
df
Out[59]:
datetime
0 2018-24 7:10:0
1 2018-8 12:1:20
2 2018-44 13:55:19
Use pd.to_datetime and pass format param:
In[60]:
df['new_datetime'] = pd.to_datetime(df['datetime'], format='%Y-%j %H:%M:%S')
df
Out[60]:
datetime new_datetime
0 2018-24 7:10:0 2018-01-24 07:10:00
1 2018-8 12:1:20 2018-01-08 12:01:20
2 2018-44 13:55:19 2018-02-13 13:55:19

You can use dateutil.relativedelta for sum the day from the first day of years.
example:
from datetime import datetime
from dateutil.relativedelta import relativedelta
datetime.now()+ relativedelta(days=5)

The documentation at strftime.org identifies the %j format specifier as handling day of the year. I don't know whether it's available on all platforms, but my Mac certainly has it.
Use time.strptime to convert from string to datetime. The output below has a newline inserted for reading convenience:
>>> time.strptime('2018-24 7:10:0', '%Y-%j %H:%M:%S')
time.struct_time(tm_year=2018, tm_mon=1, tm_mday=24, tm_hour=7,
tm_min=10, tm_sec=0, tm_wday=2, tm_yday=24, tm_isdst=-1)
The time.strftime formats datetimes, so you can get what you need by applying it to the output of strptime:
>>> time.strftime('%Y-%m-%d %H:%M:%S',
... time.strptime('2018-24 7:10:0', '%Y-%j %H:%M:%S'))
'2018-01-24 07:10:00'

Related

Pandas int type to date type

i am new to pandas and I try to convert an int type-column to an date type-column .
The int in the df is something like: 10712 (first day, then month, then year).
I tried solving this with:
df_date = pd.to_datetime(df['Date'], format='%d%m%Y')
but I always get the following value error:
time data '10712' does not match format '%d%m%Y' (match)
Thank you for your help :)
You should use %y (2-digit year) instead of %Y (4-digit year). But that is not enough.
The format %d%m%y converts 10712 to 10-07-2012, not to 1-07-2012 as you expect.
That's because of the following feature of the underlying strptime:
When used with the strptime() method, the leading zero is optional for
%m
A workaround could be to convert to a format properly understandable by strptime (and to_datetime):
>>> df = pd.DataFrame({'date': [10712, 20813, 30914]})
>>> df
date
0 10712
1 20813
2 30914
>>> df1 = df.date.astype(str).str.replace('(\d+)(\d\d)(\d\d)',
r'\2/\1/\3', regex=True)
>>> df1
0 07/1/12
1 08/2/13
2 09/3/14
>>> pd.to_datetime(df1)
0 2012-07-01
1 2013-08-02
2 2014-09-03
Use %y year specifier to parse year without century digits:
In [654]: pd.to_datetime(10712, format='%d%m%y')
Out[654]: Timestamp('2012-07-10 00:00:00')

Date and time conversion in python pandas

A .csv file has a date column. When read into a pandas DataFrame and displayed, the date and time are displayed as:
2021-06-30 19:39:25
The correct date is 30-06-2021 19:39:25
How can this be changed?
using pandas.to_datetime method to convert date format will be more reliable
df['Date'] = pd.to_datetime(df['Date'] , format = '%d-%m-%Y %H:%M:%S')
Try strftime:
>>> date.strftime('%d-%m-%Y %H:%M:%S')
'30-06-2021 19:39:25'
>>>
try below:
df = pd.DataFrame({'Date':['2021-06-30 19:39:25', '2021-07-22 19:39:25', '2021-08-18 19:39:25']})
# convert `Date` column to datetime
df['Date'] = pd.to_datetime(df['Date'])
Solution:
df['Date'] = pd.to_datetime(df['Date'] , format = '%d-%m-%Y %H:%M:%S')
if the above doesn't work then use belwo..
# Now convert to desired format
df['Date'] = pd.to_datetime(df["Date"].dt.strftime('%m-%d-%Y %H:%M:%S')).dt.strftime('%d-%m-%Y %H:%M:%S')
print(df)
0 30-06-2021 19:39:25
1 22-07-2021 19:39:25
2 18-08-2021 19:39:25
Name: Date, dtype: object

Dates go crazy when applying pd.to_datetime

I have this situation in which I have a DataFrame with a string column with some values with this format:
DD/MM/YYYY
and some with this other one:
DD/MM/YYYY HH:Mi:SS
If I try to convert everything to datetime like this
df['COLUMN'] = pd.to_datetime(df['COLUMN'])
The rows without the HH:Mi:SS go crazy and the months are interpreted as days (and viceversa).
How could avoid this and have a column with just date format?
Example of column which goes crazy:
Before conversion:
DateTime
--------
02/07/2021
15/07/2021 18:16:00
After conversion:
DateTime
2021-02-07 (This is February!!)
2021-07-15 18:16:00
Pandas to_datetime has an inbuild parameter to specify if your day is first. i.e. dayfirst
You can use it as :
df['COLUMN'] = pd.to_datetime(df['COLUMN'], dayfirst=True)
Checkout the documentation for more info.
I believe the following achieves the desired output (may not be the fastest way)
import pandas as pd
df = pd.DataFrame({'date': ['15/07/2021 18:16:00', '02/07/2021']})
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce').fillna(pd.to_datetime(df['date'], format="%d/%m/%Y %H:%M:%S", errors="coerce"))
print(df.head())
for date in df['date']:
print(type(date))
Output:
date
0 2021-07-15 18:16:00
1 2021-07-02 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

Dataframe datetime switching month into days

I am trying to convert a day/month/Year Hours:Minutes column into just day and month. When I run my code, the conversion switches the months into days and the days into months.
You can find a copy of my dataframe with the one column I want to switch to Day/Month here
https://file.io/JkWl7fsBN0vl
Below is the code I am using to convert:
df =pd.read_csv('Example.csv')
df['DateTime'] = pd.to_datetime(df['DateTime'])
df.to_csv("output.csv", index=False)
Without knowing the exact DateTime format you are using (the link to the dataframe is broken), I'm going to use an example of
day/month/Year Hours:Minutes
05/09/2014 12:30
You can determine the exact format date code using this site
Essentially, to_datetime() has a format argument where you can pass in the specific format when it is not immediately obvious. This will let you specify that what it keeps confusing for month -> day, day -> month is actually the opposite.
>>> df = pd.DataFrame(['05/09/2014 12:30'],columns=['DateTime'])
DateTime
0 05/09/2014 12:30
>>> df['DateTime'] = pd.to_datetime(df['DateTime'], format='%d/%m/%Y %H:%M')
DateTime
0 2014-09-05 12:30:00
>>> df['day'] = df['DateTime'].dt.day
>>> df['month'] = df['DateTime'].dt.month
DateTime day month
0 2014-09-05 12:30:00 5 9
>>> df['DD/MM'] = df['DateTime'].dt.strftime('%d/%m')
DateTime day month DD/MM
0 2014-09-05 12:30:00 5 9 05/09
I'm unsure about the exact format you want the day and month available in (separate columns, combined), but I provided a few examples, so you can remove the DateTime column when you're done with it and use the one you need.

Formating integer of year and week to datetime

I have a lot of dates in the format of
201801
201802
201803
...
201852
It is in the format YYYYWW.
I need it in any date format, so i can divide these dates into training and testphase.
For example 2018-01-01
You can use strptime() function from datetime package to create the date from the strings:
import datetime
dateString = "201801"
dateObject = datetime.datetime.strptime(dateString + '-1', "%Y%W-%w")
print(dateObject)
Output:
2018-01-01 00:00:00
I hope this is what you're looking for,
import datetime
datetime.datetime.strptime(date, "%Y%W")
date is a value like 201801 as a string.

Categories

Resources